Insane Reality issue #8 - (c)opyright 1996 Immortal Riot/Genesis - REALITY.005

Article: AV-Articles
Author: Who cares?

% Ripped-Off AV Articles %
__________________________

OK. Many people said they enjoyed the AV articles in IR zine #7, so here are
some more. These include:

*The Problems in Creating Goat Files - Read it for anti-bait techniques as
 well as other interesting stuff.

*Heuristic Anti-Virus Compatibility - material on Heuristic scanning
 and cleaning, aswell as AV technique in general.

*Detecting and Erradicating Known and Unknown Viruses - Scanning/Cleaning 
 infections generically via Intergrity Checking.	

*A Brief History of PC Viruses - interesting, and might give you ideas.



What?? Why should an elite Vx dude like you read AV articles??


- K N O W   Y O U R   E N E M I E S -

- _Sepultura_

=============================================================================


		 ЩЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЛ
		 К  The problems in creating goat files.  К
		 К           Igor G. Muttik               К
		 ШЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭМ
 Abstract

 Having more than 6000 of viruses for IBM PC the maintenance and updating
 of a virus library of samples is a difficult task. Parasitic file infectors
 are the majority of this great quantity and testing of their properties and
 creation of samples takes many efforts. To help solving of this problem the
 author has developed a special tool for antivirus researchers, which allows
 to create bait files (also called sacrificial goats). Theoretical points of
 bait creation (infectable objects, unusual infection conditions,
 environmental requirements) are discussed and detailed description of GOAT
 package is given.

 This paper is an attempt to summarize problems appearing during weeding of
 suspicious files and replicating of viruses. Safe testing environment based
 on hardware hard disk drive (HDD) protection is described. The paper also
 describes DOS peculiarities, appearing when working with long directories.

 Possible appearance of viruses targeted against antivirus research
 environments is discussed.


 1. Virus samples

	 1.1. What is "a virus sample"?

 A file-infector virus usually attaches itself to an executable file using
 appending or prepending technique. Such viruses are called parasitic
 infectors. Among antivirus researchers these viruses are usually transferred
 in the "sample form" -- the virus is attached to the do-nothing file of some
 fixed size (usually divisible with 10**N or 16**N) and simple contents (do-
 nothing or printing a short message on the screen). The result of infection
 of such a goat file is called "a virus sample". We have:

		 Virus sample = Virus(Goat file)
 or, simply:
		 Virus sample  = Goat file + Virus

 Can we "standardize" virus sample? Unfortunately, not, if speaking in
 general. All polymorphic viruses have zillions of instances and it is
 impossible to select some "standard" image of such a virus. Oligomorphic
 and encrypted viruses are difficult to "standardize" too. Even for
 non-encrypted viruses the problem is not simple -- they usually have some
 variables, stored inside their body (especially resident viruses) and,
 though, their image is variable.



	 1.2. Types of goat objects

 We have many infectable objects in DOS environment. This includes:

	 1) Files:
	 - EXE/COM/OV? executable files (usually started by the user)
	 - SYS drivers (called by DOS kernel at startup)
	 - BAT files (run on the user request or from AUTOEXEC.BAT)
	 - OBJ/LIB/source files (compiled into executables on the user
	   request)
	 - DLL/CPL/etc. (NE, LE, etc. - Windows, OS/2 executables)
	 - DOC/WK?/etc. (including macro and OLE files)

	 2) Pointers:
	 - MBR partition table (a'la Starship)
	 - DBR pointers (IO.SYS/MSDOS.SYS; IBMBIO.COM/IBMDOS.COM)
	 - directory entries (ex., DIR-II family)
	 - FAT pointers (ex., Necropolis)

	 3) Startup code:
	 - Flash ROM (called by microprocessor after RESET)
	 - MBR code (called by ROM BIOS after POST)
	 - DBR code on HDD (called by MBR code)
	 - DBR code on floppy (called by BIOS)
	 - DOS kernel code (called by DBR code)

 Each mentioned object can be infected and, therefore, requires preparation
 of a "goat object". Fortunately, most types of unusual infection techniques
 are very rare or even not yet found. And creation of bait objects for
 bizarre viruses is a rare task -- great majority of known viruses are
 simple parasitic file infectors. Furthermore, creation of a goat BAT file
 (or source file) is rather easy -- one can use a text editor to make a
 bait for the virus. To create a goat floppy diskette we can use standard
 FORMAT utility.

 Antivirus researchers are mostly disappointed with a problem of "virus glut"
 [Skulason]. "Virus glut" means an increase of the number of known viruses
 at a rapid rate. Great majority of this amount is file viruses. So, in most
 cases, an attention of antivirus researchers is focused on the parasitic
 file infectors. We'll discuss only this type of viruses in the rest of
 the paper.


	 1.3. Creation of goat files

 To try to replicate a virus one have to have a set of goat files. Most
 antivirus researchers have their own pre-created sets of files, produced
 using an ASM source or directly from the DEBUG utility. This approach has
 a drawback -- if new goat file is required it should be created manually.
 And if we need a lot of files (ex., for testing of polymorphic virus
 detection rate) -- the process must be repeated many times.

 Obviously, specific automated tool has many more options and capabilities.
 It can create even sets of files on one invocation. It is convenient to
 use a set of goat files with linearly increasing length (say, 1000, 2000,
 ...20000). If the virus leaves alone short victims after infection --
 this will be easily noticeable. And file growth can be calculated
 subtracting the size of the infected file from the original size.


 2. Infection of a goat file

	 2.1. "Weeding problem"

 From the point of view of an antivirus researcher all incoming suspicious
 samples should be classified in one of the following groups (for
 definition -- see VIRUS-L FAQ [FAQ]):
	 - innocent file (includes garbage and damaged programs)
	 - virus (includes germs, droppers, viruses of the 1-st generation)
	 - trojan
	 - intended
	 - joke
 Mentioned classification problem is usually called "weeding". There are
 automated and manual methods, used to weed a set of files. The following
 automated tools are used:
	 - scanners, detecting viruses by name
	 - heuristic scanner
	 - TRASHCAN/DUSTBIN, detecting non-viruses, jokes, garbage and
	   intendeds
 Manual "weeding" methods are used after automatic ones:
	 - visual analysis (ex., presence of "MZ", "PK" identifiers)
	 - tracing in DEBUG (includes partial on-the-fly disassembling)
	 - full disassembling
 We should take into account that the infected sample may be compressed with
 one of the EXE-packers (PKLITE, LZEXE, DIET, EXEPACK, COMPACK, PGMPACK,
 KVETCH, SHRINK, TINYPROG, WWPACK, AXE, IMPLODE, AVPACK, etc.). In such a
 case UNP and UUP programs should be used to remove the compression code
 before the manual analysis.

 Visual checks of incoming suspicious files are usually made using DEBUG
 or HIEW (Hackers View) -- wonderful viewer of executable files. Last one
 combines features of simple ASCII/HEX viewer with a built-in
 disassembler/assembler (both 86 and 32-bit modes) and binary file editor.
 I can hardly recommend this utility for all antivirus researchers.


	 2.2. Safety problem

 Every antivirus researcher faces a problem, when he needs to start the
 infected (or just suspicious) program or trojan horse. The typical solution
 is to use a special goat PC (usually old PC/XT/AT). But the malware can
 easily destroy data on the hard disk of this PC. It can even cause
 malfunction of the hardware (ex., low-level format IDE disk, if any). It
 will take significant time and effort to restore your testing environment.
 The hardware protection of the hard disk of your PC can only be a
 100%-reliable solution. To make hardware protection you will need some
 switch, which selects an operation mode -- "normal"/"protected".


		 2.2.1. Hardware protection using "Turbo" switch

 "Turbo" switch is rarely used in the computer operation. The reasons are
 the following. First, any user will usually select the highest possible
 speed to minimize response time of the software. Second, most available
 BIOSes support toggling of turbo mode using the keyboard (for example,
 AMI BIOS uses [Alt]-[Ctrl]-[+] to set the higher speed and [Alt]-[Ctrl]-[-]
 to set the lower speed). Therefore, you can easily replace your connection
 of "Turbo" switch to the motherboard with a simple jumper. Now your "Turbo"
 switch connector is free for use as hard-disk protection switch. Typically
 connector of "Turbo" switch has three contacts (and switch shorts two left
 contacts, or two right ones). The use of this switch to turn on/off the
 disk protection looks an elegant solution.

 Now find the jumper on your hard disk controller, which enables its
 operation (examine controller manual if needed). Most MFM, IDE and SCSI
 controllers have such a jumper. Remove this "HDD-enable" jumper and
 substitute it with the connector of "Turbo" switch (connector should
 replace the jumper on the controller and short the contacts instead of the
 jumper).

 Now, after described modification, you can easily turn off HDD simply
 pressing "Turbo" switch and return it to operation pressing it once more.

 LED indicator (or simple LED) of your PC (which usually shows the current
 frequency of processor operation) is wired to the turbo switch and reflects
 its state. You can easily configure the LED indicator to reflect current
 mode of operation (say, "On"/"FF").


		 2.2.2. Software shell for hardware protection

 To work without HDD you will need some media instead of it. Ideal solution
 is to use a ramdrive. You have to add the following statement to your
 CONFIG.SYS -- DEVICE=RAMDRIVE.SYS nnnn (where nnnn stands for the size of
 ramdrive in kilobytes; you may also need /e switch to use extended memory).
 Size of ramdrive <2MB is usually not sufficient, so better select 2-4MB.

 First, copy all software, needed for virus testing (plus suspicious files)
 to your virtual disk. After your hard disk will be switched off all
 programs will be inaccessible, so make a good selection (in my case it
 took around 1MB or more). Now you are ready to disable hard disk.
 But DOS still thinks that HDD is present. Its internal buffers and cache
 utilities (if any) still remember the current contents of some portions
 of your hard disk in the computer memory.

 The most obvious solution is the elimination of all "notes" about hard
 disk presence. To simulate the absence of hard disk on the PC, I wrote a
 special program, which clears INT_41h and INT_46h (pointers to the HDD
 disk tables), and sets number of available hard disks (BIOS variable at
 [0:475h]) to zero. To reroute any access from hard disk (ex., drives C:, D:,
 E:) to the virtual disk, I use DOS' SUBST utility, which replaces drives C:,
 D: and E: with the virtual disk drive letter (F: in my case). SUBST also
 clears HDD cache contents. Finally, DOS environment variables (ex., COMSPEC
 and PATH) should be rewritten to point on the ramdrive objects.


	 2.3. "Replication problem"

 The problem of infecting a goat file, having a sample of possible virus is
 called "replicating". Very often one researcher asks the others -- "I have
 a sample of what I think is a virus, but cannot replicate it. Have you
 tried? If anybody succeeded in doing this -- send me a sample, please..."
 And that repeats very frequently. We see that "replication problem" is one
 of the most common problems. The question is to find correct computer
 environment and meet all virus infection conditions. Obviously, both
 problems can be solved with the help of full disassembly of the viral code,
 but that is not very practical approach, because it takes much time.
 Usually, suspicious files are simply tested in so-called "goat computer".
 Only in case of problems (files do not replicate, but look suspicious) they
 are disassembled and analyzed in deep. We already saw one approach to the
 replication problem -- to ask for help from other researchers. There are
 also other options:
	 - trying a lot of different goats
	 - trying a lot of different environments
	 - manual analysis (tracing, debugging, disassembling) to find out
	   all infection conditions (i.e., requirements for the goat and
	   environment).


	 2.4. Infection conditions

 To replicate a virus we have to feed him a goat file, which meets virus
 internal infection conditions. This must be done in the environment, which
 is appropriate for the current virus. Fortunately, to make viruses more
 infective, they are usually made to operate in the wide range of
 environments. On the other hand, sometimes, numerous limitations are
 implemented to simplify the viral code (ex., Ping-Pong, Vindicator, Yale
 and Exeheader.Mz1 viruses work only on 88/86 processors; 3APA3A and
 MIREA.4156 viruses require 16 bit FAT hard disk; AT144 virus requires 286
 processor or higher; Green_Caterpillar virus needs CMOS clock; Lovechild
 virus requires only MS-DOS 3.2; Nightfall virus does not replicate without
 XMS driver [Brown]; EMMA virus requires presence of EMS [Kaspersky]; etc.).
 In the case of specific requirements only random environment selection or
 manual analysis of the virus internals may help to find the correct
 environment.

 Parasitic file infectors can theoretically infect all following types of
 files:
	 - COM
	 - EXE
		 MZ/ZM (DOS executables)
		 NE (Windows, OS/2 16-bit)
		 LE, W3 (Windows VxD, Win386)
		 LX (OS/2)
		 PE (Windows, NT 32-bit)
		 MP, P2, P3 (Pharlap DOS extenders)
	 - SYS/COM (normal DOS drivers)
	 - SYS/EXE (understood only by DOS 5.0, 6.0)

 There are following infection conditions (except file type):
	 - file size
	 - filename
	 - attributes
	 - file timestamp (date/time of creation/modification)
	 - file contents
 Most common infection condition is file type (COM/EXE) and, second,  size of
 the victim. Very short files are usually avoided, because their growth is
 too noticeable and also to avoid infection of do-nothing goat files (like
 primitive INT_20, 2-byte files).

 Most file infectors are targeted against simple DOS executables -- COM files
 and EXE files (with MZ or ZM marker). Some file infectors are capable to
 infect DOS drivers of SYS type (ex., SVC.4644, SVC.4661, SVC.4677,
 Alpha.4000, Astra, Astra_II, Cysta or Comsysexe, Terminator.3275, CCBB,
 Talon or Daemaen, Ontario, VLAD.Hemlock, Face.2521, etc.). All other formats
 of executables need reclamation of the virgin lands from virus writers. For
 example, there are only few known Windows viruses up to date (all infecting
 only executables in NE-EXE format).

 Speaking of the contents of goat files, we should mention that viruses,
 which check the internals of the victim file are rather rare. I do not mean
 a selfcheck to avoid multiple infections of the same file. I mean checking
 of virus-free areas (same as inspection of the uninfected file).
 Nevertheless, such viruses exist. Lucretia virus looks for an 0xE8 byte
 (Intel x86 CALL instruction) in the file and replaces the offset of the call
 to point on the viral body. Warlock virus avoids all files having 0Eh byte
 at the start of program code (includes all LZEXE-packed programs). Raptor
 virus does not infect EXE files with SS in the header equal to 07BC, 141D,
 ...2894 (13 entries). Behavior of Internal.1381 virus depends on the
 contents of EXE header too. Moreover, there are Zerohunter viruses, which
 look for a series of zeroes (412 bytes for Zerohunter.412 and 415 for
 Zerohunter.415) in the file and infect the victim overwriting this block of
 zeroes, if found. Zerohunter viruses are typical representatives of the
 class of "cavity virus|cavity viruses" (like Helicopter.777, Grog.Hop, Gorlovka.1022/1024,
 Russian_Anarchy.2048, Locust.2486, Tony.338, etc.).

 There are also viruses of exeheader type -- Dragon, Hobbit, SkidRow, Mike,
 VVM, Bob, XAM, Mz1, Pure, etc. They infect only EXE files having a long block
 of zeroes (around 200-300 bytes) in the EXE header (it is 512 bytes by
 default). They can be regarded as a subclass of cavity virus|cavity viruses.

 Many viruses do not infect some programs. They usually avoid command
 processor COMMAND.COM and certain antivirus or widely used programs
 (archivers, command-line shells, etc.). The following reason come to my
 mind: infection of COMMAND.COM is very noticeable and causes many
 incompatibilities, so virus writers simply filter-off COMMAND.COM to avoid
 compatibility problems. This approach has a drawback (from the virus
 writer's point of view), as the infection of COMMAND.COM with a resident
 virus guarantees that the computer will come up with a virus installed in
 memory, because COMMAND.COM is always automatically invoked during the boot
 process. Viruses try to avoid antivirus programs -- they normally check own
 integrity and virus will be detected in a minute.

 More difficult case -- if the virus infects only on certain days of week, or
 during the first 20 minutes of an hour (like Vienna.644.a does). For
 example, Kylie virus affects the victim if current year is not 1990. Fumble
 virus infects only on even dates. Virus called Invisible avoids certain COM
 files by doing checksum on the name of the victim. Viruses of Phoenix family
 (also called Live_after_Death) avoid some file sizes and about 1/8 of files
 are left uninfected. Russian Mirror (Beeper) virus infects only every third
 executed file. Some of these viruses are called "sparse" infectors. Random
 environment/goat selection may not help in this case and viruses have to be
 traced and/or disassembled.

 Many viruses require a JMP instruction in the beginning of victim file (ex.,
 first versions of Yankee_Doodle, Russian_Tiny.143, Rust.1710, Screen.1014,
 Leapfrog.516, etc.)

 All mentioned exclusions and conditions must be taken into account when
 trying to create goat files suitable for the infection and if the virus
 does not replicate.


	 2.5. infection marker|Infection markers as an obstacle for infection

 Almost all viruses try to "mark" their victims to avoid multiple infections
 of the same file, because growing of files beyond some reasonable limit
 cannot go unnoticed (because of waste of disk space and delays for the
 reinfections) and may even cause infected file to hang (ex., COM file >64k).
 Viruses use different "infection marker|infection markers":
	 - detection of self-presence (check own code; full or partial)
	 - sequence of bytes (text or binary designator; usually at specific
	   position)
	 - timestamp (62 seconds, >2000 year, etc.)
	 - file size (ex., Uruguay-#3, #4)
	 - attribute (some viruses mark their victims as ReadOnly)

 Some viruses use perfectly legal markers -- for example, seconds value (say,
 all infected files have 33s) or file length (say, all infected files'
 lengths are divisible with 23). If, occasionally, our goat file will carry
 a "marker" of the virus, it will not be infected. Fortunately, most viruses
 use specific markers. In fact, viruses have to behave in such a way to be
 infective. Therefore, it is usually easy to make an infectable goat file if
 the first attempt of replication failed because of the coincidence with a
 legal virus marker.


	 2.6. Checking of goat files after attack

 After a try to infect a goat we have to detect possible changes. If we see
 a file growth (in a directory listing) -- the reason is obvious: longer
 files are virus children. One additional test is recommended -- to check
 whether virus child is itself replicating. In some cases (because of the
 errors in the virus) it is not and, though, must be classified as intended,
 not a virus. Visual checks after the attack are made just like before the
 attack -- see 2.1.

 If the virus has stealth or semi-stealth properties -- the detection of
 infected samples is somewhat more complex. The best approach is to preserve
 all goat files, involved in the test and inspect them after clean reboot
 (copy them to a floppy disk if your HDD is disabled as described in 2.2).
 More simple, but not that reliable method -- try to remove the virus from
 the interrupt chains using, say MARK/RELEASE programs by TurboPower
 Software (MARK should be installed before the first start of the virus,
 it remembers the whole interrupt table; RELEASE should be started after the
 attack to restore old interrupt table and remove the virus from the
 interrupt chain). Unfortunately, this approach might not work if the virus
 uses tunneling.
 In principle, we can use an integrity checker to compare test files before
 and after the virus attack. This generic method can even detect almost
 all stealth viruses if used in the low-level disk access mode. For example,
 this mode is available in Russian integrity checker ADInf.


 3. "Polymorphics detection rate"

	 3.1. Huge quantities of goats

 In the products reviews we frequently read something like that: "... the
 'Polymorphic' test-set contains a mammoth 4796 infected files" [TOP] or
 "When tested against the 500 positively replicating Mutation Engine (MtE)
 samples, all but two were correctly detected as infected" [Jackson]. Why all
 these tests need so many samples of the same virus? The answer is simple
 -- because of great variability of polymorphic viruses (more correctly --
 because of the variability of the virus decryptor). Any scanner coping with
 the polymorphics have to decrypt the body of the virus and locate a
 search-string. Other approach is to try to distinguish the viral decryptor
 from a normal non-viral code. Both methods can produce both false positives
 and false negatives. They are, of course, rather rare, but practically
 (and even theoretically) unavoidable. To find out the misses of the scanner
 number of tested samples should be very big. That is why almost all
 comparisons of the scanners are performed using huge quantities of samples.
 That is, of course, rather time consuming and not very convenient, but
 unavoidable practice.

 How can we speedup the tests and preparation of samples? The first idea is
 to put virus samples on the fast media -- virtual disk looks the ideal
 selection. But can we enhance DOS' access to the drive?


	 3.2. DOS slowdown when working with long directories

 When experimenting with creation of hundreds of files I have noticed a very
 interesting peculiarity. After creating some number of files in the
 directory (in my case it was around 700 files) all additional files needed
 much more time to be created! Obviously, some internal resource of DOS was
 exhausted. To shed the light on this effect I have run the same task --
 creation of 100*N goat files (N=1..10) using GOATS (with zero size increase;
 i.e., all goats were identical), but varied number of BUFFERS (as written
 in CONFIG.SYS). Note, that disk cache (SMARTDRV) was not active, because
 files were created on the virtual disk. Collected data is given in the
 table:

	 Time needed to create given number of files (in seconds +/-1).
		 FILES   100     200     300     400     500     600     700    800     900     1000
 BUFFERS
	 15                6     12      19      28*     40      51      64     80      96      118
	 48                6     12      19      27      35      45      55     70*     90      112
	 58                6     12      19      26      35      45      55     70      82*     103
	 68                6     12      19      26      35      45      55     70      82       98
	 Note: "*" -- shows number of files, when significant slowdown occurs.

 1. We see that total time depends much on the number of BUFFERS.
 2. At some place significant slowdown always occurs (compare columns to see).
 3. Moment of this slowdown depends on the number of BUFFERS.
 4. For creation of 1000 files 68 BUFFERS are sufficient.
 5. For 48 BUFFERS slowdown occurred at around 720 files.
 6. For 58 BUFFERS slowdown occurred at around 870 files.

 Thus, addition of 10 BUFFERS (10*512=5120 bytes) shifts the limit on
 (870-720=150) files. We can calculate how much bytes are needed per file --
 5120/150=34.1. Surprisingly, it is very close to the directory entry size!
 That is an additional evidence that slowdown occurs when there is no more
 space in BUFFERS to store current directory (and DOS needs to reload it from
 disk).

 I have also found an interesting fact (not yet known to me) -- the creation
 of files in a fresh directory takes much less time, than the creation of the
 same amount of files in the same directory after removing of 1000 files! And
 the time for creation of 1000 files in used directory is approximately three
 times more, comparably to a fresh directory! That is because DOS scans a
 directory only until it encounters zero entry. And for used directory there
 are no such entries (at least near the beginning) and DOS has to scan the
 whole list of deleted entries.

 Thus, we have to create bait files in a set of fresh directories of moderate
 size. Same applies to the tests of scanners against huge virus collections --
 fresh and short directories will be scanned faster.


 4. GOAT software package

 After discussing some theoretical points, let's turn to the realization of
 these ideas in the GOAT package [GOAT]. This package is a set of tools for
 antivirus researchers, which help to create bait files (also called
 sacrificial goat files or, simply, goat files).

 The purpose of the programs can be explained using the following table:

 You need                                                       Use

 Bait file with some special internal structure              GOAT.COM
 A series of bait files of different sizes                   GOATS.COM
 Files of the same size, but with different contents         GOATSET.BAT
 Many identical files to infect them with polymorphic virus  FLOCK.COM

 Using GOAT.COM you can manually select the size, the name of a sacrificial
 goat file and vary its internals to meet the criteria, which the virus uses
 when deciding "to infect or not to infect" the victim file. You can enter
 the size of a sacrificial goat file in any of given formats: decimal,
 hexadecimal or in kilobytes. Size of the victim files can be as small as 2
 bytes and as much as many gigabytes (it is stored in 32-bit variable).
 GOAT.COM is very flexible -- it can create COM, EXE, SYS(COM) and SYS(EXE)
 files, with code at the beginning, in the middle, or at the very end of the
 goat file. Files can be filled with zeroes, NOPs, two types of pattern and
 even filled with random garbage. You can add stack segment for the EXE files,
 vary header size, and ... many other options are available.
 GOATS.COM file is intended to create a series of bait files with linearly
 increasing length. Length increase step is changeable. GOATS.COM has the
 same flexibility as GOAT.COM.

 FLOCK.COM is a creator of up to 1000000 identical files. You can infect them
 with a polymorphic virus to test its behavior and properties. FLOCK.COM uses
 the same engine as GOAT.COM and GOATS.COM. Thus, all flexibility of GOAT.COM
 is available too.

 GOATSET.BAT produces some sort of "a standard set" of files of the same size.
 These files are different (internal contents or attribute is variable).
 GOATSET.BAT needs GOAT.COM for the execution. GOAT.COM should be located in
 the current directory accessible via PATH environment variable.

 A small batch file RUN-ALL.BAT will help you to run (or infect, if you have
 a resident virus) all generated bait files.


	 4.1. Synopsis and switches

 Usage of the main program -- GOAT.COM looks like this (others are similar):

	 GOAT  Size  [Filename]  [/switch]  [/switch] ...

	 Size - decimal, hexadecimal, or in kbytes
		 (Example: 10000, 3E00h, FF00h, 31k, 512K, 2048k)
	 Filename - file to create. If no - makes GOAT000, GOAT001, ...

 Short reference of all available switches is given below in the alphabetical
 order:

 /Annnn  - set device Attribute (default=0C853h)
 /B      - place code at bottom of file (default - at start)
 /C[n]   - set selfcheck level (by default equal to 2, the highest)
		 (/C means /C0; i.e., no selfchecking at all)
 /Dnnn   - create maximum 'nnn' subdirectories (default=10)
		 (recognized only by FLOCK.COM, ignored by GOAT and GOATS)
 /E      - create EXE file (if size > 65280 - done automatically)
 /Fnnn   - create maximum 'nnn' files in a subdirectory (default=500)
		 (recognized only by FLOCK.COM, ignored by GOAT and GOATS)
 /H, /?  - Help screen
 /Inn    - use fill byte 'nn' instead of standard zero-fill
		 (ex., decimal /i100 or hexadecimal notation /iE5h)
 /J      - remove JMP at code start (default - JMP present)
 /Knnnn  - add 'nnnn' bytes of STACK segment to the bottom of EXE file
		 (stack segment is filled with 'STACK' by default)
 /Mnnnn  - place code in the middle of the file exactly at nnnn
		 position ('nnnn' is 32-bit value, but see limitations below)
 /N[nnnn]        - fill goat file with pseudorandom bytes. The parameter
		 (if given) is a random number generator seed.
		 RNG uses multiplicative congruental method with 2**32 period
 /O      - do not make long EXE (>256K) with internal overlay structure
 /P      - fill free file space with pattern 00, 01, .. FE, FF, 00, ..
 /R      - make file ReadOnly (default - normal)
 /S      - make short (32 bytes) EXE header (default - 512 bytes)
 /Tnn    - set timestamp seconds field = nn (<63, even: 0, 1Eh, 62, ..)
 /V      - set SS:SP equal to CS:IP
 /W      - make word pattern (0000, 0001, ...FFFF, 0000)
 /X      - suppress signature defined in the INI file using "Motto="
 /Y      - create device driver (SYS file)
 /Z      - make 'ZM' EXE header instead of 'MZ'
 /9      - fill free file space with NOPs (default - with zeroes)

 GOAT.COM, GOATS.COM and FLOCK.COM programs use the same set of command line
 switches. Most switches are self-explanatory.

 Pattern inside the goat file always reflects the current offsets in the file
 (i.e., it is "anchored" to the absolute location in the file). For example,
 at the file offset 1A2Bh you will see bytes "2B", "2C", "2D", ... (for byte
 pattern). Word pattern at the same location will look like this -- "2B",
 "1A", "2C", "1A", etc. Sometimes pattern filling is very useful.

 Switch /Knnnn adds stack segment at the bottom of the EXE file. Size of the
 stack segment is limited -- 16 < nnnn < 65536. Obviously, SP always points
 on the bottom of stack segment (i.e., SP=nnnn). Small and odd values in /K
 switch should be avoided, because they can hang computer or cause
 "Exception #13" (QEMM frequent warning), when SP goes through the stack
 segment boundary (i.e., half of a word is written at SS:0000 and other half
 -- at SS:FFFF).

 Switches /Fnnn and /Dnnn are recognized only by FLOCK.COM (GOAT.COM and
 GOATS.COM simply ignore them). You can specify the desired number of files
 and subdirectories to create. By default, 10 subdirectories with 500 files
 in each are created.


	 4.2. Size limitations

 By default GOAT.COM, GOATS.COM and FLOCK.COM programs produce sacrificial
 file of COM type. This applies to any given size, which meets the following
 criterion:
			 2 < Size_of_COM < 65280

 The magic number 65280 is a maximum size of COM file, which must fit in a
 segment size (64k=65536) without PSP size (256):

			  65536 - 256 = 65280.

 When placing the code at the bottom of the COM file, which size is around
 64K, code may lay too close to SS:SP (SS=CS for COM files; SP=FFFE) and the
 program may hang when run, because stack will likely overwrite the code.
 Therefore, if the spacing between IP and SP is less than 64 bytes, the goat
 generation is aborted and output file is not created (You will see a warning
 -- "Goat IP will be too close to SP. Abort!").

 When the size specified in the command line is greater than 65280 (or equal
 to), EXE file is generated automatically (you do not need to write /E or /S
 switch explicitly). Such a file will have a normal 512-bytes EXE header in
 the beginning. When you need to create EXE file shorter than 65280 bytes,
 use /E (or /S, /Z or /Knnnn) command line switch.




	 4.3. INI file

 You may like to put your preferences (signature, switches, filename
 templates, etc.) into a separate file -- GOAT.INI (common for GOAT.COM,
 GOATS.COM and FLOCK.COM). Use any text editor to create or modify INI file.
 The sample GOAT.INI file is given below:

 GOAT.INI
 Motto="Antivirus test file."   ;all output bait files will carry this string.
 GOATfiles=FPROT                ;files will be FPROT000.COM, FPROT001.COM, ..
				;(default=GOAT)
 GOATSfiles=ESASS               ;files will be ESASS000.COM, ESASS001.COM, ...
				;(default=GOAT)
 FLOCKfiles=S&S                 ;files will be S&S000.COM, S&S001.COM, ...
				;(default=GOAT)
 FLOCKdirs=HEAP                 ;directories created - HEAP000, HEAP001,
				;HEAP002
				;(default=DIR)
 STACKfill="*MYSTACK"           ;fill stack with '*MYSTACK*MYSTACK*MYSTACK'
				;(default=STACK)
 SYSname="DRIVERXX"             ;this string is inserted into SYS header
				;(default=GOATXXXX)
 Switches=/F200/D50             ;make 50 dirs, 200 files in each. 10000 in
				;total
 Switches=/C1                   ;to turn off registers check and avoid
				;warning "Your PC might be infected..."
 Switches=/iF6h                 ;always fill free file space with 0F6h byte
 Switches=/O                    ;never make overlaid EXE files

 GOAT.INI may be located in the current directory or in the path of started
 program. The first location has priority over the second. GOAT.INI may not
 exist. In that case programs use built-in defaults.

 Filename and subdirectory templates are limited to 5 symbols, because p
 rograms always add '000' and then start incrementing this number until it
 becomes '999'. Any string exceeding the limit of 5 symbols will result in
 the following error message:

		 "Error in the INI file line #nnn"

	 4.4. Bait file internals

 The bait files created with GOAT.COM, GOATS.COM and FLOCK.COM (if they have
 the same size) are absolutely identical in their internal structure and
 properties.

 Created sacrificial goat file contains a small program, which displays its
 type (COM, EXE or SYS), size in hexadecimal and in decimal (only when goat
 file is of enough size, i.e., space for code itself is at least 70 bytes).
 Sacrificial goat file consists of the two parts: the small portion of code
 (70 bytes or, if space not allows, just 2 bytes) and a block of zeroes,
 NOPs or pattern of variable size (00..FF, 0000...FFFE or random pattern).
 Zeroes (or NOPs or pattern) take all space of the file, free from the code.
 EXE files have additionally an EXE-header. Non-used part of the EXE header
 is always filled with zeroes. SYS files have additionally a device header,
 strategy and interrupt routines.

 The output of a sample goat file (the size of the sample was 100 bytes) is
 the following:

		 "Goat file (COM). Size=00000064h/0000000100d bytes."

 File type (COM/EXE/SYS) and real numbers are inserted into the goat file
 message at the moment of creation.


	 4.5. Naming of goats

 Usually GOAT.COM, GOATS.COM and FLOCK.COM programs create output sacrificial
 files in the following order: GOAT000.COM, GOAT001.COM, GOAT002.COM, etc.
 Same applies to EXE files: GOAT000.EXE, GOAT001.EXE, GOAT002.EXE, etc. If
 some file in a row (say GOAT050.COM or GOAT050.EXE) already exists -- the
 next file number is selected automatically (it will be GOAT051.COM or
 GOAT051.EXE). Thus, we cannot generate both GOAT050.COM and GOAT050.EXE in
 the same directory. This rule does not apply for SYS files (ex., GOAT000.COM
 and GOAT000.SYS are allowed). This naming strategy is used to give some
 freedom for companion|companion viruses.

 Note, that definitions, given in the INI file may change default file (and
 subdirectory) naming.


	 4.6. Bait device drivers

 There are two formats of DOS device drivers -- old format (a'la COM,
 understood by all DOS versions >2.0) and new format (a'la EXE, introduced in
 MS-DOS 3.0). Drivers of old type can only be started from CONFIG.SYS using
 DEVICE statement. The entry point is defined in special SYS header. Drivers
 of new (EXE) type can additionally be started as a normal executables from
 the DOS command prompt. Drivers of EXE type have two entry points -- one for
 invocation from CONFIG.SYS/DEVICE (as written in the SYS header, which goes
 after EXE header) and the other is defined by CS:IP fields in the EXE header
 (this one works only when file is started from the command line). The other
 advantage of EXE format driver -- it is not limited to 64K, like old type of
 drivers. Such new drivers can exceed 64K, but pointers to Strategy and
 Interrupt routines must fit into first 64k (they are limited to 16-bits).

 To create device driver (SYS) file use switch /Y. Goat drivers of the old
 (COM) style will print message "Goat file (SYS). Size=..." when DOS requests
 an initialization of the driver (during CONFIG.SYS processing). Files in new
 format (SYS&EXE) will do the same, but will print this message also when run
 from the DOS command line as a normal EXE file. In both cases this driver
 file prints the same message. Note, that EXE device drivers bear a "(SYS)"
 designator inside, but are always named as EXE files (to enable start from
 the command line as a normal executable).

 Minimal size of the device driver is around 150 bytes (including SYS header).
 This limit increases for SYS&EXE files (it should include additionally the
 size of the EXE header -- 32 bytes for /S; 512 bytes for /E).


 5. "A standard set" of goat files.

 Let's imagine that we know that we have a sample of the virus (ex., we got
 the sample from knowledgeable antivirus researcher), but we have no
 information about properties of the virus. This situation frequently occurs
 in practice. First, we test it against a set of files of different lengths
 (say, 1000, 2000, ...10000 bytes). Now we see that the virus infected 8
 files (3000, ...10000) and conclude that the virus avoids short victims
 (<3000). The "standard set" of goat files may help you to find out which
 files are preferred by the virus (ex.: virus may infect only COM files
 starting with JMP). Checking "a standard set" after virus attack, you can
 easily understand which files are infectable.

 Now we have another question -- does the virus infect all files longer than
 3000 bytes regardless of their contents? We have to test the virus against a
 set of files of fixed size, but different contents. To simplify this task
 GOAT package has the generator of "a standard set" of baits of given size
 -- it is called GOATSET.BAT. Yes, this file is really a DOS batch file,
 issuing a series of calls to GOAT.COM with different parameters.
 GOATSET.BAT makes COM, EXE and SYS files. Files are filled with zeroes or
 NOPs (90h), with initial JMP (0E9h) or without it. Some files carry
 ReadOnly attribute. EXE files are with normal (512 bytes) and short (32
 bytes) EXE headers, with MZ and ZM markers.

 GOATSET.BAT needs only one command line parameter -- size of the files in
 the set. After invocation 52 files of the same size are generated -- 12 COM,
 34 EXE, 2 SYS and 4 SYS&EXE files. GOATSET.BAT also writes a report file
 GOATSET.LOG and places there a full description of the generated bait files
 set.

 Being a BAT file, GOATSET.BAT is fully customizable. It can be easily
 changed with any text editor.


 6. Future threats

	 6.1. Anti-goat viruses

 Fortunately, there are only few viruses, that try to avoid infecting goat
 files. One of them is Sarov.1400. It uses primitive algorithm to avoid
 victims with many repeated bytes.


 Corresponding code is:

 0100 8B161C00   MOV     DX,[001C]       ;LOAD RELATIVE OFFSET IN FILE
 0104 33C9       XOR     CX,CX
 0106 D1EA       SHR     DX,1
 0108 B80042     MOV     AX,4200 ;LSEEK TO CHECKED FILE AREA
 010B E80F01     INT     21
 010E BAD804     MOV     DX,04D8 ;BUFFER LOCATION
 0111 B43F       MOV     AH,3F   ;READ 100 BYTES FROM FILE
 0113 B96400     MOV     CX,0064 ;SIZE OF BLOCK TO CHECK
 0116 8BFA       MOV     DI,DX   ;DI -> BUFFER
 0118 CD21       INT     21
 011A 268A05     MOV     AL,ES:[DI]      ;GET FIRST BYTE (ES=DS)
 011D 47 INC     DI      ;SKIP TO NEXT BYTE
 011E F3AE       REPZ    SCASB   ;COMPARE WITH THE FIRST
 0120 7455       JZ      DON'T_INFECT   ;ALL BYTES ARE THE SAME!
 INFECT_THE_FILE:        ...

 Without any doubt, more and more anti-goat viruses will appear in future.
 We can also expect appearance of more viruses, which avoid victims placed on
 virtual disk. Or viruses, which do not infect files with certain typical
 lengths (divisible with 10**N and 16**N). Fortunately, most virus writers
 have not yet realized that such features are a very strong weapon. I would
 say, comparable with polymorphicity, because in most cases full disassembly
 of the virus will be required and that takes time. Moreover, such
 anti-goat tricks are programmed much more easily than any polymorphic
 engine.


	 6.2. Armoring tricks, virus/trojan conversion

 There are a lot of viruses, which try to complicate their investigation.
 Viruses use anti-tracing techniques: SVC.4644, Ieronim, XPEH (family of
 viruses), Zherkov (called also Loz), Magnitogorsk, HideNowt, OneHalf.3544,
 OneHalf.3577, Cornucopia, etc. A wonderful set of antitracing capabilities
 is found in CPE|Compact Polymorphic Engine (CPE 0.11b), which is actually a
 virus creation tool.

 Some viruses, when they detect that they are being traced switch to the
 "trojan" mode and try to damage files, floppies and/or hard disks. That
 looks like a revenge of virus writer for an attempt of antivirus researcher
 to catch the virus. Many viruses have such a behavior -- for example,
 recently found RDA.Fighter.5871/5969/7408 (overwrites random sectors on the
 HDD) [Daniloff], rather old Maltese Amoeba (destroys 4 sectors on each of
 the first 30 cylinders of all drives), CLME.Ming.1952 (overwrites 34 first
 sectors on all drives), DR&ET.1710 (erases 128 first sectors on HDDs),
 Gambler.288 (destroys first 10 sectors on drive C:), Kotlas (removes
  original non-infected copy of MBR), SumCMOS.6000 (tries to corrupt HDD).

 The most nasty idea -- to use destructive capabilities (a'la trojan) if the
 virus senses the antivirus environment. For example, when virus detected
 goat files.


=============================================================================



=============================================================================


1    HEURISTIC ANTI-VIRUS TECHNOLOGY

Generally speaking, there are two basic methods to detect viruses - specific 
and generic. Specific virus detection requires the anti-virus program to have 
some pre-defined information about a specific virus (like a scan string).  
The anti-virus program must be frequently updated in order to make it detect 
new viruses as they appear.  Generic detection methods however are based on 
generic characteristics of the virus, so theoretically they are able to 
detect every virus, including the new and unknown ones.

Why is generic detection gaining importance?  There are four reasons: 

1)   The number of viruses increases rapidly.  Studies indicate that the 
     total number of viruses doubles roughly every nine months.  The amount 
     of work for the virus researcher increases, and the chances that 
     someone will be hit by one of these unrecognizable new viruses 
     increases too.

2)   The number of virus mutants increases.  Virus source codes are widely 
     spread and many people can't resist the temptation to experiment with 
     them, creating many slightly modified viruses.  These modified viruses 
     may or may not be recognized by the anti-virus product.  Sometimes 
     they are, but unfortunately often they are not.

3)   The development of polymorphic viruses.  Polymorphic viruses like 
     MTE and TPE are more difficult to detect with virus scanners.  It 
     is often months after a polymorphic virus has been discovered before 
     a reliable detection algorithm has been developed.  In the meantime 
     many users have an increased chance of being infected by that virus.

4)   Viruses directed at a specific organization or company.  It is possible 
     for individuals to utilize viruses as weapons.  By creating a virus that 
     only works on machines owned by a specific organization or company it 
     is very unlikely that the virus will spread outside of the organization.  
     Thus it is very unlikely that any virus scanner will be able to detect 
     the virus before the payload of the virus does its destructive work and 
     reveals itself.

Each of these scenarios demonstrates the fact that virus scanners can not 
recognize a virus until the virus has been discovered and analyzed by an 
anti-virus vendor. These same scenarios do not hold true for generic 
detectors, and therefore many people are becoming more interested in generic 
anti-virus products. Of the many generic detection methods, heuristic 
scanning is currently becoming the most important. 

2    HEURISTIC SCANNING

One of the most time consuming tasks that a virus researcher faces is the 
examination of files.  People often send files to researchers because they 
believe the files are infected by a new virus.  Sometimes these files are 
indeed infected, sometimes not.  Every researcher is able to determine very 
quickly what is going on by loading the suspected file into a debugger.  A 
few seconds is often enough, and many researchers must have asked themselves: 
"How can I determine this so quickly"?

One time I demonstrated this effect to the audience on an international 
conference. I showed the first page of the assembly listing of 
a MTE-infected file, and within about a second, Vesselin Bontchev came 
with the correct answer.  How is this possible?

2.1  ARTIFICIAL INTELLIGENCE

Some of the many differences between viruses and normal programs is that 
normal programs typically start searching the command line for options, 
clearing the screen, etc. Viruses however never search for command line 
options or clear the screen.  Instead they start with a search for other 
executable files, by writing to the disk, or by decrypting themselves.

A researcher who has loaded the suspected file into a debugger can notice 
this difference in only a glance.  Heuristic scanning is an attempt to put 
this experience and knowledge into a virus scanner.  

The word 'heuristic' means (according to a Dutch dictionary) 'the self 
finding' and 'the knowledge to determine something in a methodic way'.

A heuristic scanner is a type of automatic debugger or disassembler.  The 
instructions are disassembled and their purposes are determined. 
If a program starts with the sequence MOV AH,5 INT 13h which is a disk format 
instruction for the BIOS, this is highly suspected, especially if the program 
does not process any command line options or interact with the user.

2.2  SUSPECTED ABILITIES

In reality, heuristics is much more complicated.  The heuristic scanners that 
I am familiar with are able to detect suspicious instruction sequences, like 
the ability to format a disk, the ability to search for other executables, 
the ability to remain resident in memory, the ability to issue non-standard 
or undocumented system calls, etc.  Each of these abilities has a value 
assigned to it. The values assigned to the various suspicious abilities are
dependant on various fact. A disk format routine doesn't appear in many 
normal programs, but often in viruses. So it gets a high value. The abilities 
to remain resident in memory are found in many normal programs, so despite 
of the fact that they also appear in many viruses it doesn't get a high 
value. If the total of the values for one program exceeds a predefined 
threshold, the scanner yells "Virus!".  A single suspected ability is never
enough to trigger the alarm.  It is always the combination of the suspected 
abilities which convince the scanner that the file is a virus.

2.3  HEURISTIC FLAGS

Some scanners set a flag for each suspected ability which has been found in 
the file being analyzed.  This makes it easier to explain to the user what 
has been found.

TbScan for instance recognizes many suspected instruction sequences. Every 
suspected instruction sequence has a flag assigned to it:

2.4  FLAG DESCRIPTION

F  = Suspicious file access.  Might be able to infect a file. 
R  = Relocator.  Program code will be relocated in a suspicious way. 
A  = Suspicious Memory Allocation.  The program uses a non-standard way to 
     search for, and/or allocate memory.
N  = Wrong name extension.  Extension conflicts with program structure. 
S  = Contains a routine to search for executable (.COM or .EXE) files. 
#  = Found an instruction decryption routine.  This is common for viruses but 
     also for some protected software.
E  = Flexible Entry-point.  The code seems to be designed to be linked on any 
     location within an executable file.  Common for viruses. 
L  = The program traps the loading of software.  Might be a virus that 
     intercepts program load to infect the software. 
D  = Disk write access.  The program writes to disk without using DOS. 
M  = Memory resident code.  This program is designed to stay in memory. 
!  = Invalid opcode (non-8088 instructions) or out-of-range branch. 
T  = Incorrect timestamp.  Some viruses use this to mark infected files. 
J  = Suspicious jump construct.  Entry point via chained or indirect jumps.  
     This is unusual for normal software but common for viruses. 
?  = Inconsistent exe-header.  Might be a virus but can also be a bug. 
G  = Garbage instructions.  Contains code that seems to have no purpose 
     other than encryption or avoiding recognition by virus scanners. 
U  = Undocumented interrupt/DOS call.  The program might be just tricky but 
     can also be a virus using a non-standard way to detect itself. 
Z  = EXE/COM determination.  The program tries to check whether a file is a 
     COM or EXE file.  Viruses need to do this to infect a program. 
O  = Found code that can be used to overwrite/move a program in memory. 
B  = Back to entry point.  Contains code to re-start the program after 
     modifications at the entry-point are made.  Very usual for viruses. 
K  = Unusual stack.  The program has a suspicious stack or an odd stack.

TbScan would for instance output the following flags:

     VIRUS                    HEURISTIC FLAGS

     Jerusalem/PLO            FRLMUZ
     Backfont                 FRALDMUZK
     Minsk_Ghost              FELDTGUZB
     Murphy                   FSLDMTUZO
     Ninja                    FEDMTUZOBK
     Tolbuhin                 ASEDMUOB
     Yankee_Doodle            FN#ELMUZB

The more flags that are triggered by a file, the more likely it is that the 
file is infected by a virus.  Normal programs rarely trigger one flag, 
while at least two flags are required to trigger the alarm.  To make it 
more complicated, not all flags carry the same 'weight'.

3    FALSE POSITIVES

Just like all other generic detection techniques, heuristic scanners 
sometimes blame innocent programs for being contaminated by a virus. This 
is called a "False Positive" or "False Alarm".

The reason for this is simple. Some programs happen to have several 
suspected abilities.  For instance, the LOADHI.COM file of QEMM has the 
following suspected abilities (according to an older, yet obsolete version 
of TbScan):

 A = Suspicious Memory Allocation.  The program uses a non-standard way to 
     search for, and/or allocate memory.
 M = Memory resident code.  This program may be a TSR but also a virus. 
 U = Undocumented interrupt/DOS call.  The program might be just tricky but 
     can also be a virus using a non-standard way to detect itself. 
 Z = EXE/COM determination.  The program tries to check whether a file is a 
     COM or EXE file.  Viruses need to do this to infect a program. 
 O = Found code that can be used to overwrite/move a program in memory.

All of these abilities are available in LoadHi, and the flags are enough to 
trigger the heuristic alarm.  As LoadHi is supposed to allocate upper memory, 
load resident programs in memory, move them to upper memory, etc., all these 
suspected abilities can easily be explained and verified.  However, the 
scanner is not able to know the intended purpose of the program, and as 
most of these suspected abilities are often found in viruses, it just 
describes the LoadHi program as "a possible virus". 

3.1  HOW SERIOUS IS THE ISSUE OF FALSE ALARMS?

If a heuristic scanner pops up with a message saying: "This program is able 
to format a disk and it stays resident in memory", and the program is a 
resident disk format utility, is this really a false alarm?  Actually, the 
scanner is right. A resident format utility obviously contains code to 
format a disk, and it contains code to stay resident in memory. The heuristic 
scanner is therefore completely right! You could name it a false suspicion,
but not a false positive. The only problem here is that the scanner says that 
it might be a virus. If you think the scanner tells you it has found a virus, 
it turns out to be a false alarm. However, if you take this information as 
is, saying 'ok, the facts you reported are true for this program, I can 
verify this so it is not a virus', I wouldn't count it as a false alarm. The 
scanner just tells the truth. The main problem here is the person who has to
make decisions with the information supplied by the scanner. If it is a 
novice user, it is a problem. More about that later.




3.2  AVOIDING FALSE POSITIVES

Whether we call it a false positive or a false suspicion doesn't matter. We 
do not like the scanner to yell every time we scan. So we need to avoid this 
situation. How do we achieve this?

  1) Definition of (combinations of) suspicious abilities

     The scanner does not issue an alarm unless at least two separate 
     suspected program abilities have been found.

  2) Recognition of common program codes

     Some known compiler codes or run time compression or decryption routines 
     can cause false alarms.  These specific compression or decryption codes 
     can be recognized by the scanner to avoid false alarms.

  3) Recognition of specific programs

     Some programs which normally cause a problem (like the LoadHi program 
     used in the example) can be recognized by the heuristic scanner.

  4) Assumption that the machine is initially not infected

     Some heuristic scanners have a 'learn' mode, i.e. they are able to 
     learn that a file causing a false alarm is not a virus.

3.3  DEALING WITH FALSE POSITIVES

Some false positives are not easily avoided.  So, the user has to deal with a 
certain amount of false alarms, and must make the final decision as to 
whether a file is infected or not.

Ok, you may say, how do we know whether a suspicious program is a virus or 
innocent. There is no way to find out, that is what most people believe. 
Actually there is a way to find out, but this depends on the scanner.


The scanner has to explain to the user the reasons why the program is 
suspect. 'This file might contain a virus' actually doesn't say much to the 
user. It is always right. Every file MIGHT contain a virus, but MAY also be 
clean. We actually use a scanner to find out! What is the user supposed to 
do with this information? However, if the scanner says that some program is 
able to remain resident in memory and able to format a disk, the user can 
more easily figure out what is going on.  If a word processor gives such an 
alarm, it is extremely likely that the program carries a virus, because word 
processors generally are not able to format disks and remain resident in 
memory.  However, if the suspected file is a resident disk formatting 
utility, then all of the suspected abilities can be explained by the intended 
purpose of the program. Reason for suspicion: memory resident and disk 
formatting abilities.    

     PROGRAM                  PROBABLY

     Resident disk formatter  No Virus (innocent)
     Word processor           Malicious (virus)

     Both programs cause the same heuristic alarms, but the final conclusion 
     is different. 

Naturally, it requires an advanced user to draw a conclusion for the question 
"infected or not?".  However, my opinion is that judging the results of any 
scanner (also conventional scanners) is a task for an advanced user only.  
If the scanner has a 'learn' mode, i.e. is able to remember which programs 
cause a false alarm, the initial scan should be performed by an advanced 
user, but the subsequent scans (when the possible false positives have been 
eliminated) can be performed by a novice user.  This is already common 
practice in most organizations. 

Anyway, it isn't as bad as it seems, as all other detection methods 
(including signature scanning) are known to cause some false alarms as well.  Heuristics however has the
advantage that it is able to supply you with enough information to establish 
for yourself whether a suspected file is likely a virus or not.

4    HOW DOES HEURISTIC SCANNING PERFORM?

Heuristics is a relatively new technique and still under development. It is 
however gaining importance rapidly.  This is not surprising as heuristic 
scanners are able to detect over 90% of the viruses without using any 
predefined information like signatures or checksum values. The amount of 
false positives depends on the scanner, but a figure as  low as 0.1% can be 
reached easily. TbScan 6.02 used on the large virus collection of Vesselin 
Bontchev showed the following results:

     SCANNING            7210           DETECTION
     METHOD              FILES          PERCENTAGE

     Conventional        7056           97.86%
     Heuristics          6465           89.67%

A false positive test however is more difficult to perform so there are no 
independent
results available.


5    COMBINATION OF CONVENTIONAL AND HEURISTIC SCANNING

Some people think heuristic scanning is a replacement for conventional 
scanning.  In my opinion it is not.  Heuristic scanning serves a very useful 
purpose when used in combination with conventional scanning.  The results of 
both scanning methods can be validated by each other, thereby reducing false 
positives and also false negatives. Combined result of analysis:

     HEURISTICS       CONVENTIONAL      PROBABILITY

     clean            clean             very probably clean
     clean            virus             might be a false positive
     virus            clean             might be a false negative
     virus            virus             very probably infected


     fn: 10%          fn: 1%            combined false negatives: 0.1%   
     fp: 0.1%         fp: 0.001%        combined false positives: 0.00001%


The chances of both the heuristic scanner and the conventional scanner 
failing is minimal.  If both scanning methods have the same results, the 
result is almost certain.  In the few cases that the results don't agree 
with each other additional analysis is required. TbScan 6.02 used on the 
large virus collection of Vesselin Bontchev showed the following results:

     SCANNING         7210              DETECTION
     METHOD           FILES             PERCENTAGE

     Conventional     7056              97.86%
     Heuristics       6465              89.67%
     Combined         7194              99.78%

6    WHAT CAN BE EXPECTED FROM IT IN THE FUTURE?

->   THE DEVELOPMENT CONTINUES

Most anti-virus developers still do not supply a ready-to-use heuristic 
analyzer.  Those who have heuristics already available are still improving 
it.  It is however unlikely that the detection rate will ever reach 100% 
without a certain amount of false positives.  On the other hand it is 
unlikely that the amount of false positives will ever reach 0%. Maybe you 
wonder why it isn't possible to achieve 100% correct results. There is 
a large grey area between viruses and non-viruses. Even for humans it is hard 
to describe what a virus is or not, an often used definition of a computer 
virus is this: "A virus is a program that is able to copy itself". According 
to this definition the DiskCopy.Com program is a virus...

->   REACTION OF VIRUS WRITERS

An important issue is the effect on virus writers. It is likely that they 
will try to avoid detection by heuristic scanners.  Until now the goal was 
to avoid detection by signature scanners, and this was very easy to do, as it 
was sufficient to modify only a small part of an existing virus.  Teenagers 
with some basic understanding of programming could do so easily . Avoiding 
heuristic scanners however requires a lot more knowledge, if even possible at 
all. 

Fortunately, this detection-avoiding method of programming makes detection by
conventional anti-virus products easier because it means that the programmer 
can not use very tight and straight code. The virus writer will be forced to 
write more complex viruses.

7    THE PRO'S AND CON'S OF HEURISTIC SCANNING

     ADVANTAGES       Can detect 'future' viruses
                      User is less dependant on product updates

     DISADVANTAGES    False positives are possible
                      Judgement of the result requires some basic knowledge

8    HEURISTIC CLEANING

Before we can discuss heuristic cleaning, it is important to know how a virus
infects a program. The basic principle is not difficult.  A virus - a program
by itself - adds itself to the end of the program.  The size of the program 
increases due to this addition of the viral code.  Appending a virus program 
to another program is however not enough, the virus code should also be 
executed.  To make this happen, the virus overwrites the first bytes of the 
file with a 'jump' instruction, which makes the processor jump to the viral
code.  The virus now gains control when the program is invoked, and it will 
finally pass control to the original program. Since the first bytes of the 
file are overwritten by the jump instruction, the virus has to 'repair' 
these bytes first.  After that the virus just jumps to the beginning of the 
original program, and most often this program works as usual. 

ORIGINAL PROGRAM                                 INFECTED PROGRAM

                                        100:






                                        2487:         Virus!p     
                                                           r     
                                                     jmp 100     

To clean an infected program, it is of vital importance to restore the bytes 
being overwritten by the jump to the virus code.  The virus has to restore 
these bytes also, so somewhere in this virus code these original bytes are 
stored.  The cleaner searches for those bytes, puts them back in their 
original location, and truncates the file to the original size.


8.1  HOW DOES A CONVENTIONAL CLEANER WORK?

A conventional cleaner has to know which virus to remove.  Suppose your 
system is infected with the Jerusalem/PLO virus.  You invoke your cleaner and 
it proceeds like this: 

"Hey, this file is infected with the Jerusalem/PLO virus.  OK, this virus is 
1873 bytes in size, and it overwrites the first three bytes of the original 
program with a jump to itself. The original bytes are located at offset 483 
in the viral code.  So, I have to take those bytes, copy them to the 
beginning of the file, and I have to remove 1873 bytes of the file. 
That's it!"

8.2  SHORTCOMINGS OF CONVENTIONAL CLEANSERS

The cleaner has to know the virus it has to remove.  It is impossible to 
remove an unknown virus.

The virus should be the same as the virus known to the cleaner. Imagine 
what would happen if the virus used in the example was modified and now 1869 
bytes in size instead of 1873... the cleaner would remove too much!  This is 
not an exception, but it happens quite often since there are so many mutants.  
For instance, the Jerusalem/PLO family now contains more than 100 mutants!

Many polymorphic viruses have variable lengths and maintain the original 
instructions encrypted.  Most conventional cleaners are therefore unable 
to clean MTE infected programs.

8.3  THE VIRUS WILL REMOVE ITSELF BEFORE ACTUAL EXECUTION

We have seen above how a virus works. The interesting part is that when the 
virus passes control to the original program it restores the original bytes 
at the beginning of the program and jumps back to start the program. Every 
virus is able to repair the original program in order to keep it functional 
(except for overwriting viruses, but these can't be cleaned anyway).

8.4 LET THE VIRUS DO THE DIRTY WORK

The idea is: why not let do the virus the dirty work?  The basic principle of 
heuristic cleaning is simple.  The heuristic cleaner loads the infected file 
and starts emulating the program code.  It uses a combination of disassembly, 
emulation and sometimes execution to trace the flow of the virus, and to 
emulate what the virus is normally doing.  When the virus restores the 
original instructions and jumps back to the original program code, the 
cleaner stops the emulation process, and says 'thank you' to the virus for 
its cooperation in restoring the original bytes.  The now repaired start of 
the program is copied back to the program file on disk, and the part of the 
program that gained 'execution' will be chopped off.  An additional analysis 
of the cleaned program file will be performed to be on the safe side.

Note that the cleaner is actually removing the unknown from the unknown.  No 
predefined information about the virus or infected file is necessary.

The process of emulation is just like hitchhiking. The emulator convinces 
the viral code that it is actually executing, and it hitchhikes to the point 
where the virus passes control to the original program.

However, the actual process is very complicated.  As with hitchhiking, many 
things can go wrong:

->   Driver takes you to the wrong place
     The virus does not intend to execute the original program, but it 
     starts doing completely different things.  As the purpose of the 
     emulation is to restore the original program, we never reach our goal.

->   Driver won't let you out
     If the viral code performs an endless loop, the original program will 
     never be restored so the cleaner might wait forever.

->   Driver leaves the car
     A potentially dangerous situation is that the cleaner is too ambitious 
     in its task to emulate everything, and that the virus gets control 
     inside the emulated environment and finally escapes from it.

->   Driver hits a tree and kills you too
     Many viruses are badly programmed.  If they crash inside the emulator, 
     chances are that the emulator crashes too.

Heuristic cleaners are so complicated that there is only one available right 
now.  However, the great potential of heuristic cleaning make it likely that 
there will be more heuristic cleaners soon.

8.5  THE PRO'S AND CON'S (OF HITCHHIKING)

     ADVANTAGES       No need to recognize mutants
                      No problems with polymorphic viruses
                      Can clean 'future' viruses
                      User is less dependant on product updates

     DISADVANTAGES    No exact copy of the original
                      It cleans everything: even clean files!

Being the author of the first heuristic cleaner I have received many 
reactions to it.  Most people were surprised that my cleaner was able to 
remove MTE viruses before my scanner was even able to recognize them.  This 
is especially interesting as most anti-virus products are still not able to 
remove MTE infections. 

Of course everybody wants to know how many viruses can be removed this way.  
I can't show a reliable figure, as testing a cleaner is extremely tedious 
and time consuming task.  However, a figure of 80% is a rough estimate.  
Many conventional cleaners do not even come close to this percentage.

8.6  WHAT CAN BE EXPECTED FROM IT IN THE FUTURE?

Heuristic cleaning needs additional improvements.  Some viruses use 
anti-debugger features that also make an emulator fail.  It is also still 
possible that a virus detects that it is being emulated, and it can simply 
refuse to cooperate.  The better the emulator performs, the less likely this 
is.  Major improvements however are more likely to show up after multiple 
heuristic cleaners are available and some competition occurs. 


This text is copyright Dr. Frans Veldman, ESaSS B.V. 1994-1995 and is
reproduced with permission.

[* Yeah.. look who's caring :P - sep *]  

Requests for reproduction in part, or whole may be addressed on the 
Internet to: veldman@esass.iaf.nl, or by telephone in 
North America to: 1-800-667-8228 x331, Europe: +31 8894 22282.


=============================================================================



=============================================================================


This is the text of the lecture, presented in Boston (USA) 
on 21th September 1995 on the 5th international conference VB-95
      (Virus Bulletin - 95), 20-22 September 1995
------------------------------------------------------------


           MODERN METHODS OF DETECTING AND ERADICATING
                    KNOWN AND UNKNOWN VIRUSES

                       Dr. Dmitry Mostovoy

DialogueScience, Inc.
Computing Center of the Russian Academy of Sciences,
40 Vavilova Street, Moscow, 117967, Russia
E-mail: dmost@dials.msk.su


                            Abstract

    Viruses are  growing in  number from  day to  day, so it  is
    obvious that soon anti-virus programs like NAV or MSAV  will
    not be quite efficacious.   Therefore, we started  designing
    a program  that would  annihilate not  individual infectors,
    but viruses  in general,  regardless of  whether a  virus is
    known or not, or whether it is old or new.

    The first outcome  of our efforts  in this direction,  ADinf
    (Advanced  Diskinfoscope),  is  a  forecasting  center which
    alerts the user in advance with great reliability about  the
    intrusion of viruses, even  HITHERTO unknown infectors.   As
    distinct  from  all  other  data  integrity  checkers, ADinf
    inspects  a  disk  by  scanning  the  sectors one by one via
    direct  addressing  of  BIOS  without  the assistance of the
    operating system and  takes under check  all vital parts  of
    hard disk.   To evade  such a  detection tactics  is  almost
    impossible.

    ADinf  alerts  the  user  in  time about virus intrusion and
    restores  infected  boot  sectors.    How  to  restore   the
    infected files automatically?  Our next step was to  produce
    a  curing  companion  to  ADinf.  The  new  tool, ADinf Cure
    Module,  deploys  a  novel  strategy.  Paradoxically, ninety
    seven percents of the  viruses in our collection  fall under
    few standard groups by the types of infection methods.   New
    viruses  are  as  a  rule  designed  on  one of these common
    infection principles, and  therefore ADinf Cure  Module will
    be  about  97%  efficient  in  its  performance  also in the
    future.

    ADinf and  ADinf Cure  Module are  parts of  DialogueScience
    anti-virus kit - the most popular anti-virus in Russia.



                       INTEGRITY CHECKING

  The basic  classes of  anti-virus programs  are well  known.  They are
scanners/removers, monitors, and vaccines.  I would like to discuss  the
development of  programs to  which, in  my opinion,  anti-virus designer
pay undeservedly little  attention.  This  class of anti-virus  programs
is  known  as  ``integrity  checkers'',  though  the name does not fully
characterize the program's policy which we describe below.  This is  the
only  class  of  purely  software  means of anti-virus protection, which
permits  the  detection  of  known  and unknown viruses with reliability
approaching 100%  and eradication  up to  97% file  infectors, even  new
hitherto unknown viruses.

  The operation of integrity checkers is  based on a simple fact:   even
though  it  is  impossible  to  know  all  information about potentially
infinite  number  of  viruses,  it  is  quite possible to store a finite
volume  of  information  about  each  logical  drive  in the disk and to
detect virus infection from the changes taken place in files and  system
areas of the disk.  As already mentioned, the name "integrity  checker''
does  not  fully  reflect  the  essence  of  these  programs.  Infection
techniques is  not restricted  to a  simple modification  of the program
code.   Other  paths  for  infection  either  already  exist or are also
possible; for example, companion viruses  [1].  A disk can  be corrupted
by restructuring the  directory tree, say,  by renaming the  directories
and  creating  new  directories,   and  by  other  such   manipulations.
Consequently,  to  provide  reliable  protection integrity checkers must
take care of far more number of parameters that the mere changes in  the   
size and CRC of files as is done by most programs of this class.   Thus,
master boot record (MBR) and boot  sectors of logical drives, a list  of
bad clusters,  directory tree  structure, free  memory size,  CRC of Int
13h handler in BIOS  and even the Hard  Disk Parameter Tables, all  must
be under the  control of integrity  checkers.  Changes  in the size  and
CRC of files, creation of new  files and directories and removal of  old
files  and  directories  are  obviously  objects  for strict control.  A
designer of integrity checker must be one step ahead of virus  designers
and block every possible loophole for parasite intrusion.

  Despite  the  large  amount  of  controlled  information, an integrity
checker must nonetheless  be user-friendly, simple  in usage, and  quick
in checking  disks. It  must at  the same  time be  user-customizable as
regards the levels of messages displayed on the changes occurred in  the
disk  and  be  capable  of  conducting  a  preliminary  analysis  of the
changes, particularly the suspicious modifications such as

  - changes in size and CRC of files without any change in datestamp,

  - illegal values of hours, minutes or seconds in the datestamp of
    infected files (for example, 62 seconds),

  - year greater than the current year (certain viruses mark infected
    files by increasing the year of creation by 100 years, which cannot be
    detected visually because ``dir'' command only displays the last two
    figures of the year,

  - any changes in files specified in the ``stable'' list,

  - change in master boot record or boot sector,

  - appearance of new bad clusters on the disk and others.

  Let  us  now  discuss  the  main  problems  faced  by  a  designers of
``integrity checkers''.  First, this  is the dodging ability of  viruses
based on stealth-mechanism.   Integrity checkers that rely  on operating
system tools in their  scanning mission are absolutely  helpless against
this  class  of  viruses.   They  have  stimulated the development of an
integrity checker that  checks disks by  reading the sectors  via direct
addressing through BIOS.  Stealth viruses cannot hide the changes in  an
infected file  size; on  the contrary,  under such  a scanning technique
the  stealth-mechanism  betrays  the  presence  of  known  and  hitherto
unknown stealth viruses through the discrepancy between the  information
given out by DOS and the information obtained by reading via BIOS.  Such
algorithms have been created  and successfully detect the  appearance of
stealth-viruses.

  Scanning a disk  by reading the  sectors by direct  addressing of BIOS
has one more important merit which  is often overlooked.  If a  computer
is infected  by a  so-called ``fast  infector'' [1],  i.e., a virus that
infects  files  not  only  when  they  are  started,  but also when they
opened, such an integrity checker  will not spread the infection  to all
files in  the disk,  because it  does not  at all  address the operating
system  for  reading  a  disk  via  sectors and uses an independent file
opening system, and the viruses does not get any control.

  Finally, an integrity checker  utilizing direct reading of  sectors is
twice faster in checking  a disk than any  other program than relies  on
the  operating  system  tools,  because  a  disk  scan  algorithm can be
created  that  reads  each  sector  only  once  and  optimizes  the head
movements.

  Disk handling via BIOS has its  own hurdles.  The foremost problem  is
the  compatibility  with  innumerable  number  of  diverse  hardware and
software, including disk compactors (Stacker, DoubleSpace),  specialized
drivers  for  accessing  large  disks  (Disk Manager), SCSI disk drivers
etc.  Furthermore,  there are many  MS-DOS compatible operating  systems
that have  imperceptible but  quite important  features in  partitioning
logical drives.   Integrity checkers  must pay  due attention  to  these
fine factors.


                    VIRUS REMOVAL TECHNIQUES

    Modern  integrity  checkers  are   useful  not  only  in   detecting
infection, but  are also  capable of  removing viruses  immediately with
the help of the information they retrieve from an uninfected machine  at
the time of installation.  An integrity checkers can kill known  viruses
as well as  the viruses which  were unknown at  the time of  creation of
the integrity checker.

  How this is done?  Obvious  are the methods for removing viruses  from
the  master  boot  record  and  boot  sectors.  Integrity checker stores
images of uninfected boot  sectors in its tables  and in case of  damage
can  instantly  restore  them.  The  only restriction is the restoration
must  also  be  effected  via  direct  addressing  of  BIOS  and   after
restoration the system must be rebooted immediately in order to  prevent
the active  virus from  reinjecting infection  while accessing  the disk
via INT 13h.

  Removal  of  file  viruses  is  based  on  a  surprising fact, namely,
despite  the  vast  number  of  diverse  viruses,  there  are only a few
techniques by  which a  virus is  injected into  a file.   Here we  only
briefly  outline  the  file  restoration  strategy.   Figure  1  shows a
schematic diagram of a usual EXE file.

  For each file  integrity checker keeps  a header (area  1), relocation
table (area 2) and the code at the entry point (area 4).  Strings  (area
3 and area  5) are vital  because they are  the keys to  identifying the
mutual  locations  of  various  areas  in  an infected file when a virus
writes its tail, not  at the file end,  but at the file  beginning or in
the file body (after  the relocation table or  at the entry point).   In
an infected  file, after  determining the  area that  coincides with the
imaged areas  in the  table, the  displacement of  a block (for example,
the  block  for  area  3  begins  at  the  end of area 2 and ends at the
beginning of  the area  4) can  be identified  by string  3 position and
thus moved back to its original location.

               кФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФПЭЛ
               Г          EXE-header           Г К 1
               УФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФДЭЙ
               Г                               Г К
               Г       Relocation table        Г К 2
               УФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФДЭМ
               Г                               Г
               Г            Code               ГЭЛ
               Г                               Г К 3
               Г                               ГЭМ
               Г                               Г
               Г         Entry point   ФФФФФФФ>ГЭЛ
               Г                               Г К 4
               Г                               ГЭМ
               Г                               ГЭЛ
               Г                               Г К 5
               Г                               ГЭМ
               УФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФДЭЛ
               Г     Debug information or      Г К 6
               Г     overlays                  ГЭМ
               РФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФФй

                           Fig.1


  Image of area 6  takes about 3-4 Kb  and is essential in  recovering a
file  corrupted  by  viruses  which  damage  the  debug  information and
overlays in the course of defective infection.

  Thus,  a  file  is  recovered  by  reinstating  its  original   status
overwriting  the  image  of  its  structure  stored in integrity checker
tables  on  an  infected  file.   Consequently,  a knowledge as to which
virus infected the file is not mandatory.

  Tables  containing  information  necessary  for  recovering files take
about 200-450 Kb for one logical drive.  The table size can be cut  down
to 90 Kb, if a user  decides not to save the relocation  information and
this will not have any perceptible influence on the quality of  recovery
in most of the cases.


			   CONCLUSION

  Integrity  checkers  undoubtedly  do  not  provide  a  panacea against
computer viruses.   Unfortunately, there  is no  such panacea,  nor  can
there be one.   But they are  quite reliable protection  utilities which
must  be  used  jointly  with  other  classes  of anti-virus tools.  The
highlights of integrity checkers described above are all implemented  in
ADinf program, the most popular  itegrity checker in Russia. It  also is
known in Germany  where it is  distributed on CD-ROM  as a component  of
the DialogueScience  Anti-Virus Kit.   It checks  a disk  by reading its
sectors  one  by  one  directly  addressing  BIOS,  easily  traps active
stealth viruses by comparing  the information obtained through  BIOS and
DOS. It instantly  restores up to  97% of files  corrupted by known  and
unknown viruses.


                           REFERENCES

  1. Vesselin  Bontchev,  Possible  Virus Attacks  Against Integrity
                          Programs And How  To Prevent Them,  Proc.
                          2nd Int. Virus Bulletin Conf.,  September
                          1992, pp. 131-141.

  2. Mostovoy D. Yu.,    A  Method of Detecting and Eradicating
                         Known   and    Unknown    Viruses,    IFIP
                         Transactions,  A-43,  Security&Control  of
                         Information   Technology    in    Society,
                         February, 1994, pp.  109-111.


=============================================================================



=============================================================================


                     ЩЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЛ
                     К A brief history of PC viruses К
		     К      by Dr Alan Solomon       К
                     ШЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭЭМ

1986-1987 - the prologue

It all started in 1986. Basit and Amjad realised that the boot
sector of a floppy diskette contained executable code, and this
code is run whenever you start up the computer with a diskette in
drive A. They realised that they could replace this code with
their own program, that this could be a memory resident program,
and that it could install a copy of itself on each floppy
diskette that is accessed in any drive. The program copied itself
- they called it a virus. But it only infected 360 kb floppy
disks.

In 1987, the University of Delaware realised that they had this
virus, when they started seeing the label " (c) Brain" on floppy
diskettes. That's all it did - copy itself, and put a volume
label on diskettes.

Meanwhile, also in 1986, a programmer called Ralf Burger realised
that a file could be made to copy itself, by attaching a copy of
itself to other files. He wrote a demonstration of this effect,
which he called VIRDEM. He distributed it at the Chaos Computer
Club conference that December, where the theme was viruses.
VIRDEM would infect any COM file; again the payload was pretty
harmless.

This attracted so much interest, that he was asked to write a
book. Ralf hadn't thought of boot sector viruses like Brain, so
his book doesn't even mention them. But by then, someone had
started spreading a virus, in Vienna.

In 1987, Franz Swoboda became aware that a virus was being spread
in a program called Charlie. He called it the Charlie virus. He
made lots of noise about the virus (and got badly bitten as a
result). At this point, there are two versions of the story -
Burger claims that he got a copy of this virus from Swoboda, but
Swoboda denies this. In any case, Burger obtained a copy, and
gave it to Berdt Fix, who disassembled it (this was the first
time anyone had disassembled a virus). Burger included the
disassembly in his book, after patching out a couple of areas to
make it less infectious and changing the payload. The normal
payload of Vienna is to cause one file in eight to reboot the
computer (the virus patches the first five bytes of the code);
Burger (or maybe Fix) replaced this reboot code with five spaces.
The effect was that patched files hung the computer, instead of
rebooting. This isn't really an improvement.

Meanwhile, in the US, Fred Cohen had completed his doctoral
dissertation, which was on computer viruses. Dr Cohen proved that
you cannot write a program that can, with 100% certainty, look at
a file and decide whether it is a virus. Of course, no one ever
thought that you could, but Cohen made good use of an existing
mathematical theorem and earned a doctorate. He also did some
experiments; he released a virus on a system, and discovered that
it travelled further and faster than anyone had expected.

In 1987, Cohen was at Lehigh, as was Ken van Wyk. So was the
author of the Lehigh virus. Lehigh was an extremely unsuccessful
virus - it never managed to spread outside its home university,
because it could only infect COMMAND.COM and did a lot of damage
to its host after only four replications. One of the rules of the
virus is that a virus that quickly damages it host, cannot
survive. However, the Lehigh virus got a lot of publicity, and
led to van Wyk setting up the Virus-L newsgroup on Usenet. Lehigh
was nasty. After four replications, it did an overwrite on the
disk, hitting most of the File Allocation Table. But a virus that
only infects COMMAND.COM, isn't very infectious.

Meanwhile, in Tel Aviv, Israel (some say in Italy), another
programmer was experimenting. His first virus was called Suriv-01
(virus spelled backwards). It was a memory resident virus, but it
could infect any COM file, whereas Lehigh could only infect
COMMAND.COM. This is a much better infection strategy than the
non-TSR strategy used by Vienna, as it leads to files on all
drives and all directories being infected. His second virus was
called Suriv-02, and
that could infect only EXE files, but it was the first EXE
infector in the world. His third attempt was called Suriv-03, and
it could handle COM and EXE files. His fourth effort escaped into
the world, and became known as Jerusalem virus. Every Friday
13th, instead of infecting files that are run, it deletes them.
but Friday 13t are not common, so the virus is pretty
inconspicuous, most of the time. It avoids infecting COMMAND.COM,
because in those days, many people believed that this was the
file to watch (see Lehigh).

It looks as if it escaped rather than was released, because it
plainly was not ready for release. The author decided to change
the way that the virus detected itself in EXE files, and had made
part of that change. There is redundant code from the Suriv
viruses still in place, and also what looks like debugging code.
It was found in the Hebrew University of Jerusalem (hence the
name) by Yisrael Radai.

While all this was going on, a young student at the University of
Wellington, New Zealand, had found a very simple way to create a
very effective virus. One time in eight, when booting from an
infected floppy, it also displayed the message 'Your PC is now
Stoned', hence the name of the virus.

The virus itself was just a few hundred bytes long, but because
of its selfrestraint, and memory-resident replication, it has
become the most widespread virus in the world, accounting for
over a quarter of outbreaks. It is very unlikely that Stoned
virus will ever become rare. The virus spread rapidly, because of
its inconspicuousness (and because in those days, people were
keeping a careful eye on COMMAND.COM, because of Lehigh).

In Italy, at the University of Turin, a programmer was writing
another boot sector virus. This one put a bouncing ball up on the
screen, if the disk was accessed exactly on the half hour. It
became known as Italian virus, Ping pong, or Bouncing Ball. But
this virus had a major defect - it couldn't work on anything
except an 8088 or 8086 computer, because it uses an instruction
that doesn't work on more advanced chips. As a result, this virus
has almost died out (as has Brain, which can only infect 360 kb
floppies, and which foolishly announces its presence via the
volume label).

Back in the US, an American was demonstrating a problem that has
continued to dog US virus writers ever since - incompetence. The
Lehigh didn't make it outside a small circle - neither did the
Yale virus. This was another boot sector virus, but it only
copied itself when you booted from an infected floppy, then put
another floppy in to continue the boot process.  No subsequent
diskette was infected, and if the boot-up continued from a hard
disk, there was no infection at all. Yale never spread at all
widely, either.

But also in 1987, a German programmer was writing a very
competent virus, the Cascade, so called after the falling letters
display that it gave. Cascade used a new idea - most of the virus
was encrypted, leaving only a small stub of code in clear for
decrypting the rest of the virus. The reason for this was not
clear, but it certainly made it more difficult to repair infected
files, and it restricted the choice of search string to the first
couple of dozen bytes. This idea was later extended by Mark
Washburn when he wrote the first polymorphic virus, 1260
(Chameleon). Washburn based Chameleon on a virus that he found in
a book - the Vienna, published by Burger.

Cascade was supposed to look at the Bios, and if it found and IBM
copyright, it would refrain from infecting. This part of the code
didn't work. The author soon released another version of the
virus, 1704 bytes long instead of 1701, in order to correct this
bug. But the corrected version had a bug that meant that it still
didn't detect IBM Bioses.

Of these early viruses, only Stoned, Cascade and Jerusalem are
common today, but those three are very common.

1988 - the game begins

1988 was fairly quiet, as far as virus writing went. Mostly, it
was the year that anti-virus vendors started appearing, making
a fuss about what was at that time only a potential problem,
and not selling very much anti-virus software. The vendors were
all small companies, selling their software for very low prices
(#5 or $10 was common). Some of them were shareware, some were
freeware.  Occasionally some larger company tried to pop up, but
no-one was paying serious cash to solve a potential problem.

In some ways, that was a pity, because 1988 was a very virus-
friendly year. It gave Stoned, Cascade and Jerusalem a chance to
spread undetected, and to establish a pool of infected objects
that will ensure that they never become rare.

It was in 1988 that IBM realised that it had to take viruses
seriously. This was not because of the well-known Christmas tree
worm, which was pretty easy to deal with. It was because IBM had
an outbreak of Cascade at the Lehulpe site, and found itself in
the embarrassing position of having to inform its customers that
they might have become infected there. In fact, there was no real
problem, but from this point on, IBM took viruses very seriously
indeed, and the High Integrity Computing Laboratory in Yorktown
was given responsibility for the IBM research effort in this
field.

1988 saw a few scattered, sporadic outbreaks of Brain, Italian,
Stoned, Cascade and Jerusalem. It also saw the final arguments
about whether viruses existed or not. Peter Norton, in an
interview, said that they were an urban legend, like the
crocodiles in the New York sewers, and one UK expert claimed that
he had a proof that viruses were a figment of the imagination. In
1988, the real virus experts would debate with such people -
after that year, real virus experts would simply walk away from
anyone who had such absurd beliefs.

Each outbreak of a virus was dealt with on a case-by-case basis.
One American claimed that he had a fully equipped mobile home for
dealing with virus outbreaks (and another one extrapolated to the
notion that soon there would be many such mobile units). Existing
software was used to detect boot sector viruses (by inspecting
the boot sector), and one-off software was written for dealing
with outbreaks of Cascade and Jerusalem.

In 1988, a virus that is called "Virus-B" was written. This is
another virus that doesn't go memory resident, and it is a
modification of another virus that deletes files on Friday 13th.
When this virus is run, it displays "WARNING!!!! THIS PROGRAM IS
INFECTED WITH VIRUS-B! IT WILL INFECT EVERY .COM FILE IN THE CURRENT
SUBDIRECTORY!". A virus that is as obvious as that, was clearly
not written to spread. It was obviously written as a
demonstration virus. Virus researchers are often asked for
"harmless viruses" or "viruses for demonstration"; most
researchers offer some alternative, such as an overhead foil, or
a non-virus program that does a falling letters display. But it
looks as if VIRUS-B was written with the intention of giving it
away as a demonstration virus - hence the warning. And, indeed,
we find that an American company was offering it to "large
corporations, universities and research organizations" on a
special access basis.

At the end of 1988, a few things happened almost at once. The
first was a big outbreak of Jerusalem at a large financial
institution, which meant that dozens of people were tied up in
doing a big clean-up for several days. The second was that a
company called S&S did the first ever Virus Seminar that actually
explained what a virus was and how they worked. The third was
Friday 13th.

It was clear that we couldn't go out and help everyone with a
virus, even if we bought a mobile home and equipped it (with
what)?  It was also clear that the financial institution, and the
academic site, could easily handle a virus outbreak, but they
didn't have the tools to do the job. All they needed was a decent
virus detector, which was not available. So we wrote one, added
some other tools that experience said might be useful, and
created the first Anti Virus Toolkit.

In 1989, the first Friday 13th was in January. At the end of
1988, it was clear that Jerusalem was in Spain and the UK, at
least, and was in academic as well as commercial sites. Because
of the destructive payload in the virus, we felt that if we failed
to send out some sort of warning, we would be negligent. But the
media grabbed the ball and ran with it; the predictability of the
trigger day, together with the feature of it being Friday 13th,
caught their imagination, and the first virus media circus was
under way.

On the 13th of January, we had dozens of phone calls, mostly from
the media wanting to know if the world had ended yet. But we also
had calls from a large corporate site, a small vendor of PC
hardware, and a couple of single users. We were invaded by TV
cameras in droves, and had to schedule them carefully to avoid
them tripping over each other. In the middle of all this, the PC
Support person from the infected corporate arrived. The TV people
wanted nothing better than a victim to film, but the corporate
wanted anonymity. We pretended that he was just one of our staff.
Also, at that time, British Rail contacted us - they also had an
outbreak of Jerusalem, and they went public on it. Later, they
regretted that decision, because for a long time afterwards,
their PC Support person was badgered by the media seeking
interviews.

1989 - Datacrime

1989 was the year that things really started to move. The Fu
Manchu virus (a modification of Jerusalem) was sent anonymously
to a virus researcher in the UK, and the 405 virus (a
modification of the overwriting virus in the Burger book) was
sent to another UK researcher. A third UK researcher wrote
a virus and sent it to another UK researcher - in 1989, the UK
was where it was all happening. But not quite all. In 1989, the
Bulgarians started getting interested in viruses, and Russia was
beginning to awaken.

In March of 1989, a minor event happened that was to trigger an
avalanche. A new virus was written in Holland. A Dutchman calling
himself Fred Vogel (a very common Dutch name) contacted a UK
virus researcher, and said that he had found this virus all over
his hard disk. He also said that it was called Datacrime, and
that he was worried that it would trigger on the 13th of the next
month.

When the virus was disassembled, it was found that on any day
after October 12th, it would trigger a low level format of
cylinder zero of the hard disk, which would, on most hard disks,
wipe out the File Allocation Table, and leave the user
effectively without any data. It would also display the virus's
name, Datacrime virus. A straightforward write-up of the effect
of this virus was published, but it was another non-memory-
resident virus, and so highly unlikely to spread.

However, the write-up was reprinted by a magazine, another
magazine repeated the story, a third party embellished it a bit,
and by June it was becoming an established fact that it would
trigger on October 12th (not true, it triggers on any day *after*
the 12th, up till December 31st) and that it would low level
format the whole hard disk. In America, the press started calling
it "Columbus Day virus" (October 12th) and it was suggested that
it had been written by Norwegian terrorists, angry at the fact
that Eric the Red had discovered America, not Columbus.

Meanwhile, in Holland, the Dutch police were doing one of the
things that falls within those things that police are supposed to
do - crime prevention. Datacrime virus was obviously a crime, and
the way to prevent it was to run a detector for it. So the
commissioned a programmer to write a Datacrime detector, and
offered it at Dutch police stations for $1. It sold really well.
But it gave a number of false alarms, and it had to be recalled,
and replaced with version 2. There were long queues outside the
Dutch police stations, lots of confusion about whether anyone
actually had this virus (hardly anyone did, but the false alarms
muddied the waters).

If the police take something seriously, it must be serious,
right? So in July, large Dutch companies started asking IBM
if viruses were a serious threat. Datacrime isn't, but there
is a distinct possibility that a company could get Jerusalem,
Cascade or Stoned (or Italian, in those days before 8088
computers became a rarity). So what is IBM doing about this
threat, they asked?

IBM had internal-use-only anti-virus software. They used this to
check incoming media, and to make sure that an accident like
Lehulpe could never happen again. IBM had a problem - if they
didn't offer this software to their customers, they could look
very bad if on October 13th a lot of computers went down. The
technical people knew that this wouldn't happen, but obviously
they knew that someone, somewhere, might have important data on a
computer that would get hit by Datacrime.  IBM had to make a
decision about whether to release their software, and they had a
very strict deadline to work to - October the 13th would be too
late.

In September of 1989, IBM sent out version 1.0 of the IBM
scanning software, together with a letter telling their customers
what it was, and why they were sending it out. When you get a
letter like that from IBM, and a disk, you would be pretty brave
to take no notice, so a lot of large companies scanned a lot of
computers, for the first time. Hardly anyone found Datacrime, but
there were instances of the usual viruses.

October 13th fell on a Friday, so there was a double event -
Jerusalem and Datacrime. In the US, Datacrime (Columbus Day) had
been hyped out of all proportion for a virus that is as
uninfective as this one, and it is highly likely that not a
single user had the virus. In Europe (especially in Holland)
there might have been a few, but not many.

In London, the Royal National Institute for the Blind announced
that they'd had a hit, and had lost large amounts of valuable
research data, and months of work. We investigated this
particular incident, and the truth was that they had a very minor
outbreak of Jerusalem, and a few easily-replaced program files
had been deleted. Four computers were infected. But the RNIB
outbreak has passed into legend as a Great Disaster. Actually,
the RNIB took more damage from the invasion of the television and
print media than from the virus.

By the end of 1989, there were a couple of dozen viruses that we
knew about, but we didn't know that in Bulgaria and Russia, big
things were brewing.

1990 - the game gets more complex

By 1990, it was no longer a matter of running a couple of dozen
search strings down each file. Mark Washburn had taken the Vienna
virus, and created the first polymorphic virus from it. We didn't
use that word at first, but the idea of his viruses (1260, V2P1,
V2P2 and V2P6) was that the whole virus would be variably
encrypted, and there would be a decryptor at the start of the
virus. But the decryptor could take a very wide number of forms,
and in the first few viruses, the longest possible search string
was just two bytes long (V2P6 got this down to one byte). To
detect this virus, it was necessary to write an algorithm that
would apply logical tests to the file, and decide whether the
bytes it was looking at were one of the possible decryptors.

One consequence of this, was that some vendors couldn't do this.
It isn't easy to write such an algorithm, and many vendors were,
by this time, relying on search strings extracted by someone
else. The three main sources of search strings were a newsletter
called Virus Bulletin, the IBM scanner, and reverse engineering a
competitor's product. But you can't detect a polymorphic virus
this way (indeed, two years after these viruses were published,
many products are still incapable of detecting these viruses).
Washburn also published his source code, which is now widely
available. At the time, we thought that this would bring out a
number of imitators; in practice, no-one seems to be using
Washburn's code. However, plenty of virus authors are using his
idea.

Another consequence of polymorphic viruses, was an increase in
the false alarm rate. If you write code to detect something that
has as many possibilities as V2P6, then there is a chance that
you will flag an innocent file, and that chance is much greater
than with the sort of virus that you can find with a 24-byte scan
string. A false alarm can be as much hassle to the user as a real
virus, as he will put all his anti-virus procedures into action.

Also, in 1990, we saw a number of virus coming out of Bulgaria,
especially from someone who called himself "Dark Avenger". The
Dark Avenger viruses introduced two new ideas. The first idea was
the "Fast infector"; with these viruses, if the virus is in
memory, then simply opening a file for reading, triggers the
virus infection. The entire hard disk is very soon infected.  The
second idea in this virus, was that of subtle damage. Dark
Avenger-1800 occasionally overwrites a sector on the hard disk.
If this isn't noticed for a period of time, the corrupted files
are backed up, and when the backup is restored, the data is still
no good. Dark Avenger targets backups, not just data. Other
viruses came from the same source, such as the Number-of-the-
Beast (stealth in a file virus) and Nomenklatura (with an even
nastier payload than Dark Avenger.

Also, Dark Avenger was more creative about distributing his
viruses. He would upload them to BBSes, infecting shareware anti-
virus programs, together with a documentation file that gave
reassurance to anyone who checked the file size and checksums. He
uploaded his source code also, so that people could learn how to
write viruses.

In 1990, another event happened in Bulgaria - the first virus
exchange BBS. The idea was that if you uploaded a virus, you could
download a virus, and if you uploaded a new virus, you were given
full access. This, of course, encourages the creation of new
viruses, and gets viruses into wider circulation. Also, the VX BBS
offered source code, which makes the technology of writing a virus
more widely available.

In the second half of 1990, The Whale appeared. Whale was a very
large, and very complex virus. It didn't do very much; mostly, it
crashed the computer when you tried to run it. But it was an
exercise in complexity and obfuscation, and it arrived in virus
author's hands like a crossword puzzle to be solved. Some virus
researchers wasted weeks unravelling Whale, although in practice
you could detect it with a couple of dozen search strings, and you
didn't really need to do any more, as the thing was too clumsy to
work anyway. But because it was so large and complex, it achieved
fame.

At the end of 1990, the anti-virus people saw that they had to get
more organised - they had to be at least as organised as the virus
authors. So EICAR (European Institute for Computer Antivirus
Research) was born in Hamburg, in December 1990. This gave a very
useful forum for the anti-virus researchers and vendors to meet
and exchange ideas (and specimens), and to encourage the
authorities to try to prosecute virus authors more vigorously. At
the time that EICAR was founded, there were about 150 viruses, and
the Bulgarian "Virus factory" was in full swing.


1991 - product launches and polymorphism

In 1991, the virus problem was sufficiently interesting to attract
the large marketing companies. Symantec launched Norton Anti Virus
in December 1990, and Central Point launched CPAV in April 1991.
This was soon followed by Xtree, Fifth Generation and a couple of
others. Most of these companies were rebadging other companies
program (nearly all Israeli). The other big problem of 1991 was
"glut". In December 1990, there were about 200-300 viruses; by
December 1991 there were 1000 (there may have been even more
written that year, because by February, we were counting 1300).

Glut means lots of viruses, and this causes a number of unpleasant
problems. In every program, there must be various limitations. In
particular, a scanner has to store search strings in memory, and
under Dos, there is only 640 kb to use (and Dos, the network shell
and the program's user interface might take half of that).

Another Glut problem, is that some scanners slow down in
proportion to the number of viruses scanned for. Not many
scanners work this way, but it certainly poses a problem for
those that do.

A third Glut problem, comes with the analysis of viruses; this is
necessary if you want to detect the virus reliably, to repair it,
and if you want to know what it does. If it takes one researcher
one day to disassemble one virus, then he can only do 250 per
year. If it takes one hour, that figure becomes 2000 per year,
but whatever the figure, more viruses means more work.

Glut also means a lot of viruses that are similar to each other.
This then can lead to mis-identification, and therefore a wrong
repair. Very few scanners attempt a complete virus
identification, so this confusion about exactly which virus is
being found, is very common.

Most of these viruses came from Eastern Europe and Russia - the
Russian virus production was in full swing. But another major
source of new viruses was the virus exchange BBSes.

Bulgaria pioneered the VX BBS, but a number of other countries
quickly followed. Some shut down not long after they started up,
but the Milan "Italian Virus Research Laboratory" was where a
virus author called Cracker Jack uploaded his viruses (which were
plagiarised versions of the Bulgarian viruses). Germany had
Gonorrhea, Sweden had Demoralised Youth, America had Hellpit, UK
had Dead On Arrival and Semaj. Some of these have now either
closed down or gone underground, but they certainly contributed
to the glut problem. With a VX BBS, all a virus author has to do,
is download some source code, make a few simple changes, then
upload a new virus, which gives him access to all the other
viruses on the board.

1991 was also the year that polymorphic viruses first made a
major impact on users. Washburnhad written 1260 and the V2
series long before, but because these were based on Vienna, they
weren't infectious enough to spread. But in April of 1991,
Tequila burst upon the world like a comet. It was written in
Switzerland, and was not intended to spread. But it was stolen
from the author by a friend, who planted it on his father's
master disks. Father was a shareware vendor, and soon Tequila was
very widespread.

Tequila used full stealth when it installed itself on the
partition sector, and in files it used partial stealth, and was
fully polymorphic. A full polymorphic virus in one for which no
search string can be written down, even if you allow the use of
wild cards. Tequila was the first polymorphic virus that was
widespread. By May, the first few scanners were detecting it, but
it was not until September that all the major scanners could
detect it reliably. If you don't detect it reliably, then you
miss, say, 1% of infected files. The virus starts another
outbreak from these overlooked instances, and has to be put down
again, but now there is that old 1%, plus another 1% of files
that are infected but not detected. This can continue for as long
as the user has patience, until eventually the hard disk contains
nothing but files that the scanner cannot detect. The user,
thinks that after the virus coming back a number of times, it
gradually infected fewer and fewer files, until now he has gotten
rid of it completely.

In September 1991, Maltese Amoeba spread through Europe - another
polymorphic virus. By the end of the year, there were a few dozen
polymorphic viruses. Each of these is classified as "difficult",
meaning it takes a virus researcher more than a few hours to do
everything that needs to be done. Also, most products need some
form of hard coding in order to detect the virus, which means
program development, which means bugs, debugging, beta testing
and quality control. Furthermore, although a normal virus won't
slow down most scanners, a polymorphic virus might.

It was also in 1991, that Dark Avenger announced the first virus
vapourware. He threatened a virus that had 4,000,000,000
different forms. In January 1992, this virus appeared, but it
wasn't a virus.

1992 - Michelangelo

January 1992 saw the Self Mutating Engine (MtE) from Dark
Avenger. At first, all we saw was a virus that we named
Dedicated, but shortly after that, we saw the MtE. This came as
an OBJ file, plus the source code for a simple virus, and
instructions on how to link the OBJ file to a virus to give you a
full polymorphic virus. Immediately, virus researchers set to
work on detectors for it. Most companies did this in two stages.
In some outfit, stage one was look at it and shudder, stage two
was ignore it and hope it goes away. But at the better R&D sites,
stage one was usually a detector that found between 90 and 99% of
instances, and was shipped very quickly, and stage two was a
detector that found 100%. At first, it was expected that there
would be lots and lots of viruses using the MtE, because it was
fairly easy to use this to make your virus hard to find. But the
virus authors quickly realised that a scanner that detected one
MtE virus, would detect all MtE viruses fairly easily. So very
few virus authors have taken advantage of the engine (there are
about a dozen or two viruses that use it).

This was followed by Dark Avenger's Commander Bomber. Before CB,
you could very easily predict where in the file the virus would
be. Many products take advantage of this predictability to run
fast; some only scan the top and tail of the file, and some just
scan the one place in the file that the virus must occupy if it
is there at all. Bomber transforms this, and so products either
have to scan the entire file, or else they have to be more
sophisticated about locating the virus.

Another virus that came out at about that time, was Starship.
Starship is a fully polymorphic virus (to defeat scanners), with
a few neat anti-debugging tricks, and it also aims to defeat
checksummers with a very simple trick. Checksumming programs aim
to detect a virus by the fact that it has to change executable
code in order to replicate. Starship only infects files as they
are copied from the hard disk to the floppy. So files on the hard
disk never change. But the copy on the floppy disk is infected,
and if you then copy that onto a new hard disk, and tell the
checksummer on the new machine about this new file, the
checksummer will happily accept it, and never report any changes.
Starship also installs itself on the hard disk, but without
changing executable code. It changes the partition data, making a
new partition as the boot partition. No code is changed, but the
new partition contains the virus code, and this is run before it
passes control on to the original boot partition.

Probably the greatest event of 1992 was the great Michelangelo
scare. One of the American anti-virus vendors forecast that five
million computers would go down on March the 6th, and many other
US vendors climbed on to the band wagon. PC users went into a
purchasing frenzy, as the media whipped up the hype. On March the
6th, between 5,000 and 10,000 machines went down, and naturally
the US vendors that had been hyping the problem put this down to
their timely and accurate warning. We'll probably never know how
many people had Michelangelo, but certainly in the days leading
up to March the 6th, a lot of computers were checked for viruses.
After March 6th, there were a lot of discredited experts around.

The reaction to the Michelangelo hype did a lot of damage to the
credibility of people advicating sensible antivirus strategies,
and outweighed any possible benefits from the gains in awareness.

In August 1992, we saw the first serious virus authoring
packages. First the VCL (Virus Creation Laboratory) from Nowhere
Man, and then Dark Angel's Phalcon/Skism Mass-Produced Code
Generator. These packages made it possible for anyone who could
use a computer, to write a virus. Within twelve months, dozens of
viruses had been created using these tools.

Towards the end of 1992, a new virus writing group called ARCV
(Association of Really Cruel Viruses) had appeared in England -
within a couple of months, the Computer Crime Unit of New
Scotland Yard had tracked them down and arrested them. ARCV
flourished for about three months, during which they wrote a few
dozen viruses and attracted a few members.

Another happening of 1992, was the appearance of people selling
(or trying to sell) virus collections. To be more precise, these
were collections of files, some of which were viruses, and many
of which were assorted harmless files. In America, John Buchanan
offered his collection of a few thousand files for $100 per
copy, and in Europe, The Virus Clinic offered various options
from #25. The Virus Clinic was raided by the Computer Crime
Unit; John Buchanan is still offering viruses for sale.

Towards the end of 1992, the US Government was offering viruses
to people who called the relevant BBS.

1993 - Polymorphics and Engines

Early in 1993, XTREE announced that they were quitting the
antivirus business. This was the first time that a major company
had given up the struggle.

Early in 1993, a new virus writing group appeared, in Holland,
called Trident. The main Trident author, Masouf Khafir, wrote a
polymorphic engine called the Trident Polymorphic Engine, and
release a virus that used it, called GIRAFE. This was followed by
updated versions of the TPE. The TPE is much more difficult to
detect reliably than the MtE, and very difficult to avoid false
alarming on.

Khafir also released the first virus that worked according to a
principle first described by Fred Cohen. The Cruncher virus was a
data compression virus, that automatically added itself to files
in order to auto-install on as many computers as possible.

Meanwhile, Nowhere Man, of the Nuke group, had been busy. Early
in 1993, he released the Nuke Encryption Device (NED). This was
another mutator that was more tricky than MtE. A virus called
Itshard soon followed.

Phalcon/Skism was not to be left out. Dark Angel released DAME
(Dark Angel's Multiple Encyptor) in an issue of 40hex; a virus
called Trigger uses this. Trident released version 1.4 of TPE
(again, this is more complex and difficult than previous
versions) and released a virus called Bosnia that uses it.

Soon after that, Lucifer Messiah, of Anarkick Systems had taken
version 1.4 of the TPE and written a virus POETCODE, using a
modified version of this engine (1.4b).

Early in 1993, another highly polymorphic virus appeared, called
Tremor. This rocketed to stardom when it got included in a TV
broadcast of software (received via a decoder).

In the middle of 1993, Trident got a boost when Dark Ray and John
Tardy joined the group. Tardy has released a fully polymorphic
virus in 444 bytes, and we can expect more difficult things from
Trident.

The main events of 1993, were the emergence of an increasing
number of polymorphic engines, which will make it easier and
easier to write viruses that scanners find difficult to detect.


The future

There will be more viruses - that's an easy prediction. How
many more is a difficult call, but over the last five years,
the number of viruses has been doubling every year or so. This
surely must slow down. If we say 1500 viruses in mid-1992, and
3000 in mid-1993, then we could imagine 5000 in mid 1994 and we
could expect to reach the 8,000 mark some time in 1995. Or
perhaps we are being optimistic?

The glut problem will continue, and could get sharply worse.
Whenever a group of serious anti-virus researchers meet, we find
an empty room, hang "Closed for cleaning" on the door, and
frighten each other with "nightmare scenarios". Some of the older
nightmare scenarios have already come true, others have not, but
remain possibilities. The biggest nightmare for all anti-virus
people is glut. There are only about 10-15 first class anti-virus
people in the world, and most of the anti-virus companies have
just one of these people (some have none). It would be difficult
to create more, as the learning curve is very steep. The first
time you disassemble something like Jerusalem virus, it takes a
week. After you've done a few hundred viruses, you could whip
through something as simple as Jerusalem in 15 minutes.

The polymorphic viruses will get more numerous. It turns out that
they are a much bigger problem than the stealth viruses, because
stealth is aimed at checksummers, but polymorphism is aimed at
scanners, which is what most people are using. And each
polymorphic virus will be a source of false alarms, and will
cause the researchers much more work than the normal viruses.

The polymorphic viruses will also continue to get more complex,
as virus authors learn the technique, and increasingly try to
ensure that their viruses cannot be detected.

Scanners will get larger - more code will be needed because more
viruses will need hard coding to scan for them. The databases
that scanners use will get larger; each new virus needs to be
detected, identified and repaired. Loading the databases will
take longer, and some programs will have memory shortage
problems.

As Windows becomes more popular, people will be increasingly
reluctant to run scanners under Dos. But if you are running
Windows, you have run software on the hard disk, and if one of
the things you've run is infected by a virus, you have a virus in
memory. If there is a virus in memory, you cannot trust what the
computer is saying - it could be a stealth virus. Windows will
make antivirus software less secure.

The R&D effort to keep scanners up-to-date will get more and
more. Some companies won't be able to do it, and will decide that
scanning is outdated technology, and try to rely on checksumming.
Other companies will licence scanners from one of the few
companies that still maintains adequate R&D (we've already
started seeing some of this). Some companies will decide that the
anti-virus business isn't as profitable as they had thought, and
will abandon their anti-virus product, and go back to their core
business.

Users will get a lot more relaxed about viruses. We've long since
passed the stage where a virus is regarded as a loathsome
disease, to be kept secret. But we're increasingly seeing people
who regard a virus on their system with about the same degree of
casualness as a bit of fluff on their jacket. Sure, they'll wipe
it off, but there's not real need to worry about it happening
again. This is perhaps a bit too relaxed an attitude, but what
can you expect if a user keeps on getting hit by viruses, and
nothing terrible ever seems to result.

Anti-virus products will mature a lot. Those without any kind of
decent user interface will have a hard time competing against
the pretty ones. Those with a long run time will be rejected
in favour of those that run in seconds. Exactly which viruses
are detected will have far less emphasis (it is very difficult
for users to swallow claims about so many thousands of viruses)
than the ease of use of the product, and the amount of impact it
has on the usability of the computer.

New products will keep arriving, as each company invents the
product that makes all previous products obsolete. Sometimes the
magic ingredient will be software (AI, neural nets, whatever is
the latest buzzword) and sometimes it will be hardware (which can
never be infected, except that that isn't the problem). These
products will burst on a startled world in a blaze of publicity,
and vanish without trace when users find that installing them
makes their computer unusable, or else it doesn't find any
viruses, or both. But new ones will come along to take their
place.

Gradually, people will trade up from Dos to whatever takes its
place - OS/2, Windows-NT or Unix, and the Dos virus will become
as irrelevant as CPM. Except that Dos will still be around 10 or
even 20 years from now, and viruses for the new operating system
will start to appear as soon as it is worth writing them.

Some computers are already being built with ingrained resistance
to viruses. Some brands of computer are already immune to boot
sector viruses, provided you make a simple choice in the CMOS
setup (don't boot from the floppy). Right now, very few users are
being told that these computers can be set up that way, but
people are gradually finding out for themselves. This doesn't
solve the virus problem, but anything that makes the world a
difficult place for viruses must be a help.

The virus problem will be with us for ever. It isn't the
dramatic, worldshaking kind of problem that Michelangelo was made
out to be; nor is it the fluff-on-your-jacket kind of problem.
But as long as people have problems with computers, other people
will be offering solutions for those problems.


=============================================================================