Insane Reality issue #7 - (c)opyright 1995 Immortal Riot File 005 % Virus-Bulletin % ------------------ Here follow a few articles from The Virus-Bulletin. I have no clue of how this material got onto my harddrive, but I find it somehow interesting, and decided to publish it. It's what I was told heavily copyrighted, so if someone get's offended seeing AV-material published in an VX-zine, go sue someone, but don't try anything against me. This zine doesn't have any "responsible publisher", so I cannot be found guilty for anything. Imo, information of any kind worth the effort being typed up, should be free and total and if possible released to the major public (such as included in a zine). If you still think I'm responsible for letting this material out, go figure out a way how to prove that XX really is me :-), or don't bother. Ok? I didn't bother to comment and reply upon all articles included, but ah well, I don't care if you don't. There is, however one article I added a few comments in. It was written by Jim Bates and it's about how to catch a viruswriter ;), and why we should be caught. Hmm, maybe this is a good reason? Hehhe, well, don't try it son, you're not that good :-). - XX ================================================================== ============== VB95_209.TXT ÄÄÄÄÄÄÄÄÄÄÄÄ HEURISTIC SCANNERS: ARTIFICIAL INTELLIGENCE? Righard Zwienenberg Computer Security Engineers, Postbus 85 502, NL-2508 CE Den Haag, The Netherlands Tel +31 70 362 2269 Fax +31 70 365 2286 Email rizwi@csehost.knoware.nl Though not explicitly stated, heuristic anti-virus methods have been in use for almost as long as the virus threat has existed. In the 'old days', FluShot(+) was a very popular monitor, alerting the user when it detected 'strange and dangerous' actions. This can be regarded as simple heuristic analysis, because FluShot did not know if the action was legitimate or not. It just warned the user. During the last couple of years, several resident behaviour-blockers have been developed, used, and dismissed again. In most cases, the user finds warnings irritating, aggravating and incomprehensible. The only resident protection they normally use - if any - is a resident scanner. This makes life easier for the users, because the resident scanner clearly indicates that a file or disk is infected by a certain virus when it pops up its box. The disadvantage, which the user doesn't see, is that it does not detect new viruses. Also, the less popular (but very important) Integrity Checkers may be regarded as heuristic tools. They warn the user when the contents of files have been changed, when files have grown in size, received new time and date stamps, etc. They often display a warning such as: 'file might be infected by an unknown virus' in the case of a changed executable. Especially in a development environment, Integrity Checkers can be really irritating. The user already knows that his executable has changed, because he just changed and recompiled the source code. But how is the Integrity Checker to know that? Using a list of executables to skip is not safe, because a virus may indeed have infected an executable on the list. In that case, the change was not caused by a recompilation. However, the integrity checker can't tell the difference! Based on these early attempts, the first generation of scanners with minor heuristic capabilities were developed. The heuristics they used were very basic and usually generated warnings about peculiar file date and file time stamps, changes to file lengths, strange headers, etc. Some examples: EXAMPLE1. COM 12345 01-01-1995 12:02:62 EXAMPLE2. COM 12345 01-01-2095 12:01:36 EXAMPLE3. EXE Entry point at 0000:0001 The heuristics of the current, second, generation of scanners are much better. All the capabilities of the first generation scanners have obviously been retained, but may new heuristic principles have been added: code analysis, code tracing, strange opcodes, etc. For example: 0F POP CS Strange opcode-an 8086-only instruction! C70600019090 MOV WORD PTR [100],9090 C606020190 MOV WORD PTR [102],90 E9 JMP 0100 Tracing through the code shows that it jumps back to the entrypoint: B9 MOV CX,.... BE.... MOV SI,.... 89F7 MOV DI,SI AC LODSB 34A5 XOR AL,A5 AA STOSB E2 LOOP .... This is obviously decryption code. A (third generation?) scanner type based exclusively on heuristics exists, performing no signature, algorithmic or other checks. Maybe this is the future, but the risk of a false alarm (false positive, true negative) is quite high at the moment. In large corporations, false alarms (false positives) can cost a lot of time and thus money. We are not going to examine this scanner type except to note that it may lead us into a new generation, or area, of system examination and protection: Rule-based Examination Systems. RULE-BASED EXAMINATION SYSTEMS Rule-based systems are as such not a novelty. They already exist, also in the security field. In this field they are often characterised by applying very few, but very broad, rules. What we are going to look at here are Rule-based Examination Systems seen as large heuristics analysers. Looking at this sequence of opcodes: B8DCFE MOV AX,FEDC CD21 INT 21 3D98BA CMP AX,BA98 75.. JNE getint21 E9.. JMP wherever getint21: B92135 MOV AX,3521 CD21 INT 21 everyone in the field of computer security can see that we may have a virus here (or at least suspicious or badly programmed code). The problem is how to convert something we see in a split second into one or more specific and relevant behaviour characteristics, which we can feed into an examination system. This in turn is able to tell us whether or not we are looking at a virus. With most of the rules used by the first generation of heuristic scanners, this was not at all difficult. Most were simple comparisons (<,>,==,!=) of the type: 'If a file date exceeds the current date, or is after the year 2000, give an alert';'If the seconds field of the file time shows 62 seconds, we can conclude that this is pretty strange and give an alert'. This generation of heuristics, of course, did not have the power the analyse the code in the example shown above. The second generation of heuristic scanners has more possibilities. Bearing those in mind, defining a rule to cover the example above is not difficult, but imagine a complex decryption routine preceding the actual (virus/Trojan/suspicious) code or-most likely- legitimate code. For example: re-vector int 3 re-vector int 1 disable keyboard get int1 offset into di get int1 offset into si add counter-1 to si to point to encrypted data add counter-2 to di to point to encrypted data get word into ax perform some calculations with ax to decrypt word store word increase counter-1 increase counter-2 look if end of encrypted code has been reached jmp back if more code to decrypt enable keyboard... In case this is just one of the instances generated by a complex mutation engine, it will be hard to derive a heuristic rule directly to detect a virus using this engine. One of the solutions, maybe the best one, is to include a code emulator in the analysing system as illustrated in the figure above (missing --Ed.), which shows a part of a working network security system. The file to be checked is first given to a checksummer. If the file is already known to the system, a hash code is generated across the file, and this is compared to a stored value. If these are identical, no further action is taken, and the file is declared clean. If not, the file is fed to the emulator, and the results from the code emulation are given to an analyser as described below. Including a code emulator is possible, and as a matter of fact has already been done. It should have special knowledge of a variety of possible tricks used in malicious code; it should know when to stop emulating (e.g. at the end of a decryption routine); it should be able to realise when anti-debug tricks are used, etc. Both in order to obtain portability, and to avoid obvious pitfalls, it must adhere to one basic and important rule: Never actually execute an instruction, only emulate it. In short, the task of the emulator is first to make sure that the code is decrypted (in case it was encrypted), and then to derive and combine relevant behaviour characteristics to pass on to the analyser, which analyses and organises these behaviour characteristics and compares the results of the analysis with a set of rules. ARTIFICIAL INTELLIGENCE From the point of view of the developer it would be nice if such a system were able to learn about behaviour characteristics and generate new rules automatically. If the system bypasses an instance of virus/Trojan/suspicious code because the current rules are no longer sufficient, special examination tools should be able to extract the necessary information from the code in question and create new rules enabling the system to detect this trojan/virus/suspicious code, and hopefully every other form derived from this one. In other words: Artificial Intelligence. For security reasons, these additional tools with their special functionality should not be given to users. Evil-minded knowledgeable persons could use them to do an in-depth disassembly to research the possibilities of bypassing the rules generated by the system. Security through obscurity may not be safe, but it does help... EMULATOR DESIGN ISSUES When designing a code emulator for forensic purposes, a number of special requirements must be met. One problem to tackle is the multiple opcodes and multiple instructions issue: 87 C3 XCHG AX,BX 93 XCHG BX,AX 87 D8 XCHG BX,AX The result is the same, but different opcodes are used. PUSH AX PUSH AX PUSH BX MOV AX,BX POP AX POP BX POP BX These give the same result. More than the five different code sequences shown above exist to exchange the contents of registers AX and BX. The technique of expressing the same functionality using many different sets of opcode sequences is used by encryptors generated by polymorphic engines. Some being over 200 bytes in size, they only contain the functionality of a cleanly coded decryptor of 25 bytes. Most of the remaining code is redundant, but sometimes seemling redundant code is used to initiate registers for further processing. It is the job of the emulator to make sure that the rule-based analyser gets the correct information, i.e. that the behaviour characteristics passed to the analyser reflect the actual facts. No matter which seris of instructions/opcodes are used to perform 3D02h/21h, the analyser only has to know that the behaviour of that piece of code is: Open a file for (both reading and) writing. On the one hand, this may not seem that difficult. Most viruses do perform interrupt calls, and when they do, we just have to evaluate the contents of the register to derive the behaviour characteristic. On the other hand, this is only correct if we talk about simple, straightforward viruses. For viruses using different techniques (hooking different interrupts, using call/jmp far constructions) it may be very difficult for the emulator to keep track of the instruction flow. In any case, the emulator must be capable of reducing instruction sequences to the bare functionality in a well-defined manner. We call the result of this reduction a behaviour characteristic, if it can be found in a pre-compiled list of characteristics to which we attach particular importance. Another problem is that the emulator must be capable of making important decisions, normally based on incomplete evidence (we obviously want to emulate as little code as possible before reaching a conclusion regarding the potential maliciousness of the software in question). Let us illustrate this with a small example: MOV AX, 4567 INT 21 CMP AX, 7654 JNE jmp-1 JMP jmp-2 This is an example of an 'Are you there?' call used by a virus. When tracing through the code, the emulator obviously doesn't know whether jmp-1 or jmp-2 leads to the code which installs the virus in case it is not already there. So, should the emulator continue with the jmp-1 flow or the jmp-2 flow? Now, a simple execution of the code will result in just one of these flows being relevant, whereas a forensic emulator must be able to follow all possible program flows simultaneously, until either a flow leads to a number of relevant behaviour characteristics being detected, at which time the information is passed to the analyser, or a flow has been followed to a point where one of the stop-criteria built into the emulator is met. The strategy used in this part of the emulator is a determining factor when it comes to obtaining an acceptable scanning speed. Hopefully, this has illustrated some of the problems associated with designing a forensic emulator. It is a very diffcult and complex part of this set-up. Once the emulator has finished its job it passes information, a list of behaviour characteristics which it has found in the code, on to the analyser. BEHAVIOUR RULES Before the analyser is able to compare the behaviour characteristics found by the emulator to information in its behaviour database, this database needs to be defined. Assume that we have a COM and an EXE file infecting virus with the following behaviour: ! MODIFY FILE ATTRIBUTE REMOVING READ-ONLY FLAG ! OPEN A FILE FOR (BOTH READING AND) WRITING !* WRITE DATA TO END OF FILE !* MODIFY ENTRY POINT IN HEADER or WRITE TO BEGINNING OF FILE - MODIFY FILE DATE AND FILE TIME - CLOSE FILE - MODIFY FILE ATTRIBUTE If we want to develop a behaviour rule for this virus, it will look like this: 1. MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_EOF + MODIFY_EP_IN_HEADER 2. MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_EOF + WRITE_DATA_TO_BOF where rule 1 is a rule for the EXE-file, and rule 2 for the COM file. Since a lot of viruses and virus source codes are widely available, a number of different instruction sequences resulting in this functionality will probably show up. Normally, derived viruses contain minor changes to bypass a single scanner by just changing the order of two or more instructions, but sometimes larger code sequences can be changed without changing the functionality of the virus. It is trivial to change the code, so it will first modify the entry-point in the header of change the start-up code, and afterwards write the virus code. In order to detect these changes (variants) the next rules may be added: 3. MODIFY_FILE_ATTRIBUTE + OPEN_FILE + MODIFY_EP_IN_HEADER + WRITE_DATA_TO_EOF @CODE LINE = 4. MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_BOF + WRITE_DATA_TO_EOF Another example (an MBR infector): - PERFORM SELF CHECK ! HOOK INT13 ! BECOME RESIDENT ! INTERCEPT READ/WRITE TO MBR ! READ MBR -* WRITE MBR TO OTHER LOCATION !* WRITE NEW MBR Rule: HOOK_INT13 + INTERCEPT_READ/WRITE_TO_MBR + WRITE_NEW_MBR The signs in front of the descriptors in the examples above hint at the weighing procedure used by the analyser to attach significance to the behaviour characteristics supplied by the emulator A. '-' means that the characteristic does not have to be present, an '!' that it must be present (but does not in itself indicate malicious code). A '*' indicates a high weighing value. This '-*' means that the characteristic does not have to be present in the sequence of actions, but if it is, this is a highly important fact. If rules 1-4 above are examined more closely, it can be concluded that they describe behaviour found in a number of viruses from different families. A single behaviour rule may detect an unlimited number of viruses. That is the power behind using behaviour characteristics. While at present we in most cases need a new signature or new (changed) algorithm to detect a new variant of a virus or a new virus family, the behaviour characteristics will continue to do their work. This is extremely important, because it removes the necessity for the virus researcher and the anti-virus developer to react to a new virus unless its technologically innovative. And those are few and far between. Of course, some viruses will be developed which will not be caught by any of the rules in the behaviour database. These must be taken care of just like we do right now with any new virus; but instead of creating a signature, we create a new rule. With a little luck, a new virus behaves like a virus already covered by a rule. If we attach a level of importance to each part of a behaviour characteristic, we can use this in the analyser to arrive at a conclusion. Depending on the level of importance of each individual component of a behaviour characteristic detected, the system may decide to give a message to the user, such as 'may be infected by an unknown virus', or 'suspicious code'. The reason for attaching a level of importance to each individual part of a behaviour characteristic is that it makes it easier to sort out cases where combinations of individually innocent behaviour characteristics put together constitute malicious code-or vice versa. Filedate, from Norton's Utilities, is able to change file date and time; as a matter of fact, this is the purpose of the utility. The ATTRIB command is developed to change file attributes. Evidently, changing file attributes is in itself insufficient evidence of malicious behaviour. A virus needs to write to a file as well. So a file write is mandatory for code to be considered suspicious and is heavily weighted. A change of attributes is not that important, and thus given a lower weighting. If the user so wishes, the file or part of the (decryptor) code on which the analysing system triggered can be checked by a signature scanner to see if a virus can be identified. CREATING RULES AUTOMATICALLY An important part of the system is a Rule Building Utility. Whenever a new virus or Trojan emerges, it may be processing by this utility, which is similar to the emulator, albeit with some important differences. The emulator only collects behaviour information without knowing anything about the importance of a particular type of behaviour, or if the behaviour is suspicious. The Rule Building Utility has to learn the level of importance of behaviour characteristics, has to know which behaviour is mandatory for a virus or Trojan, which behaviour is used by a virus but may be omitted, etc. Because research and development time is very expensive, the utility must be able to remember this for similar behaviour characteristics, and only ask for additional unknown information when needed,, saving the researcher valuable time. Behaviour A: Behaviour B: SEARCH FIRST FILE SEARCH FIRST FILE DELETE FILE DELETE FILE SEARCH NEXT FILE CREATE FILE WRITE CODE INTO FILE SEARCH NEXT FILE When rules have been defined for behaviour B and a file (behaviour A, which was reported being suspicious) is processed, the utility must be able to realise that this behaviour is not as indicative of potential maliciousness as behaviour A. (Typo? Should be B? --Ed.) As a matter of fact, if behaviour A is taken on its own, it might well be a DEL *.* command. At first, the utility will, ask for input frequently, because it needs to build up its database. However, over a period of time this type of utility should make life easier for the researcher. CONCLUSION The number of viruses is increasing rapidly: this is a known fact. The time will soon arrive when scanning using signatures and dedicated algorithms will either use too much memory or just become too slow. With storage media prices dropping fast, lots of systems now come equipped with very large hard disks, which will take more and more time, and thus money, to scan using traditional techniques. A properly designed rule-based analysing system feeding suspicious code into a scanner, which can identify the suspicious code as a known virus or Trojan, or perhaps dangerous code needing further investigation, is bound to save a lot of time. Although it is impossible to prove that code is not malicious without analysing it from one end to the other, we in Computer Security Engineers Ltd believe it possible to reduce significantly the time used to check files by using all the available system knowledge instead of only small bits of it, as it is done today. Using virus scanning as the primary, or in many cases the only, anti-virus defence is an absurd waste of time and money, and furthermore blatantly insecure! ABOUT CSE Computer Security Engineers Ltd is one of the pioneers of anti-virus system development. The anti-virus system PC Vaccine Professional was first published in 1987, and since the start of 1988 a new version has been published each and every month. From 1988, cryptographic checksumming was introduced as the primary line of defence, scanning as the second. in 1992, the emphasis shifted, and behaviour blocking was introduced as the first line of defence, followed by checksumming and-in the case of an alarm from one of these countermeasure or to examine incoming diskettes-scanning for known viruses. Most recently, the basic philosophies underlying PC Vaccine Professional, or PCVP as the system is also known, were expanded into a powerful and easily-maintained network perimeter and an in-depth defence based on the well-known military tenets of:(1) keep them out and (2) if you can't keep them out, find and destroy them as fast as possible. ================================================================== ============= VB_210.TXT ÄÄÄÄÄÄÄÄÄÄ VIRUS DETECTION-'THE BRAINY WAY' Glenn Coates&David Leigh Staffordshire University,School of Computing,PO Box 334,Beaconside, Stafford, ST18 ODG,UK Tel +44 1782 294000 Fax +44 1782 353497 ABSTRACT This paper explores the potential opportunities for the use of Neural Networks in the detection of computer viruses. Neural computing aims to model the guiding principles used by the brain for problem solving, and apply them to a computer domain. It is not known how the brain solves problems at a high level;however, it is widely known that the brain uses many small highly interconnected units called 'neurons'. Like the brain, a neural network can be trained to solve a particular problem or recognise a pattern by example. The outcome is an algorithm-driven recogniser which does not exhibit the same behaviour as a deterministic algorithm. According to the way in which it has been trained, it may make 'mistakes'. That is, it may declare a positive result for a smple which is actually negative, and vice-versa. The ratio of correct results to incorrect results can usually be improved by more and better training. Can such pattern recognition be harnessed to the use of virus detection? It could be argued that the characteristics of virus patterns, no matter how they are expressed, are suitable subjects for detection by Neural Networks. INTRODUCTION The received wisdom is that neural computing is an interesting 'academic toy' of little use, apart from modelling the animal brain. If this is true, then it is surprising that 7 out of 10 of the UK's leading blue chip companies are either investigation the potential of neural computing technology or are actually developing neural applications[Con94]. If leading edge companies are prepared to spend money on this 'academic toy', then maybe there are advantages to be gained from its use. Without investigation new techniques (for example heuristic scanning), one must accept that the rapid rise in new viruses will exert a heavy speed penalty from existing virus scanners. As a result of this rise in virus numbers and sophistication, there will be an increasing conflict between acceptable speed and acceptable accuracy. It is easy to become complacent and rely on increasing processor power to bail us out of this problem, but processor design is increasingly becoming a mature technology. What follows are the results of a feasibility study into the utilisation of neural networks within the field of virus detection. WHAT IS A NEURAL NETWORK? The working of the brain are only known at a very basic level. It contains approximately ten thousand million processing units called neurons, each of these neurons is connected to approximately ten thousand others. This network of neurons forms a highly complex pattern recognition tool, capable of conditional learning. Figure 1 illustrates a model of the biological neuron alongside its corresponding mathematical model. This individual neuron is stimulated by one or more inputs. In the biological neuron, some inputs will tend to excite the neuron, whilst others may be inhibitory. That is to say, some carry more 'weight' than others. This is mirrored in the mathematical model via the use of a 'weighting mechanism'. The neuron accumulates the total value of its inputs, before passing through a threshold function to determine its final output. This output is then fired as an input to another (or a number of) neurons, and so on. In the biological neuron, the axon performs the threshold function. The mathematical model would typically used a sigmoid function or a simple binary 'yes/no' threshold function. The reader is referred to [Mar93] for further discussion. NEURAL NETWORK DEVELOPMENT When approaching a problem using a neural network, it is not always necessary to know in detail what is to be done before planning its use. In this sense, they are quite unlike procedurally-based computer programs, which have been written with a distinct goal in mind if they are to work properly. It is not even like a declarative program, for the same rule should apply. It is, perhaps, more like an expert system, where the outcome depends on the way in which an expert has answered a pre-defined series of questions. In this approach, a 'standard' three-layer neural network is constructed using the 'back propagation' learning algorithm. The architecture consists of an input layer, a hidden layer, and an output layer. Training is carried out by submitting a 'training set' of data to the network's input, observing what output is given, and adjusting the variable weights accordingly. Each neuron in the network processes its inputs, with the resultant values steadily percolating through the network until a result is given by the output layer. This output result is then compared to the actual result required for the given input, giving an error value. On the basis of this error value, the weights in the network are gradually adjusted, working backwards from the output layer. This process is repeated until the network has learnt the correct response for the given input [DTI95]. Figure 2 illustrates this. In this instance, the inputs represent the virus information, or other data concerning a virus-infected file. There are only two possible outputs, corresponding to 'possible virus found' and 'file appears to be OK'. The training data is divided into two classes, one containing the data for an infected file;and the other, uninfected files. When a suitable output is generated for the training data, the network is checked with a separate 'validation set'. if the output for the validation set is not acceptable, it is merged with the original training set and the entire process is repeated. This process is described schematically in figure 3. The result should be a very robust fuzzy recogniser capable of coping with unseen data. Because neural networks can process deeply hidden patterns, some have provided decisions superior to those made by trained humans. EXISTING SYSTEMS In 1990, a neural network was developed which acted as a 'communications link' between the mass of virus information available and end-user observations. By answering a set of standard questions regarding information on virus symptoms, the virus could be classified, and a set of remedies was given. Due to the nature of neural networks, the system could cope with incomplete and erroneous data provided by the end user. Even when faced with a new mutation, the system still gave suitable counter-measures and information. See [Gui91] for a full discussion. IDENTIFICATION OF VIRUS CODE PATTERNS VIA NEURAL NETWORKS A neural network could be constructed to learn the actual machine code patterns of a specific virus. However, as most viruses are mutations of existing viruses, a network could be made to identify a virus family. This carries the advantage of being capable of identifying future variants. This would result in a set of sub-networks linked together to provide the end solution. At the lowest level this could be done at the bit level. Figure 4 illustrates this. Although recognition at this level would be very difficult (if not impossible for a human) a neural network would be capable of it. The only limiting factors would be the volume and quality of the training data. The number of input neurons for a 1/2K virus code segment with a one-neuron output would be 4096. Given this, according to the 'geometric pyramid rule', the number of neurons in the hidden layers would be 64. The number of virus samples for effective recognition would be in the region of at least 525,000. This figure should then be trebled for the number of non-infected files. Others would argue far more, due to the problems associated with false positives. At a higher level, the input data could be represented at the byte level, where each byte would correspond to a single input neuron. In this context, the number of hidden neurons would be reduced to 22, and the number of virus samples would be at least 23,000. Again, the same applies for the number of non-infected files. This figure could be reduced further by pre-processing the code segment by extracting operand information, which could also increase accuracy and training time. The British Technology Group, with the involvement of Oxford University, conducted research into such a solution. Although no formal documentation was procued, the results are believed to be negative. From this, it can be seen that the use of neural networks in virus detection only seems practical at a high level. After all, a virus expert armed with a 'Virus Detection Language' and a 'Generic Decryption Engine' can provide a 100% accurate scanning result with advanced polymorphic viruses such as Pathogen in a relatively short period of time. A NEURAL NETWORK POST-PROCESSOR Rather than utilising a neural network to solve the virus alone, one could be used to process high level information, for example, that generated by a heuristic scanner. Currently, most heuristic scanners use a form of emulation in order to determine the behaviour of a program file. Should that program appear to execute a suspicious activity, a 'flag' is set indicating this. However, some of these flags indicate more virus-like activity than others. In order to solve this problem, the flags are weighted via a score. Therefore, a flag indicating a 'suspicious memory reference' may be given more weighting than a flag indicating an 'inconsistent EXE header'. The total weights of the set flags are computed, and if a set threshold value is met, the heuristic scanner issues a suitable warning. In the example of a well-known heuristic scanner, 35 of these flags are used. The weights are applied on an experimental basis. Initially, the weights are applied using a 'best-guess approach', based on the virus experts' knowledge. The reuslts of this are then tested on a virus collection and on a clean set of files. The results are analysed, and the weights adjusted accordingly. This cycle continues until satisfactory results are obtained. Figure 5 illustrates this cycle. This process will probably increase in complexity over the next few years. In the above example, the number of flags could literally double due to the increase in knowledge, new techniques employed by the virus writers, and further development of heuristic scanners. It is imminent that the cycle of adjust, re-adjust will become far more complex and time-consuming. For example, why should flag-x be given a weight of 8, and not 7 or 9, and flag-y be given a weight of 1, and not 2? Already, one can see that the illustrated cycle is very similar in nature to that used in neural network training. Indeed, a neural network could be used in place of the weighting mechanism and bias imposed by the virus expert. Based on the results of other neural network applications, the results should be very accurate, because the neural network will 'learn' the 'optimum' weights. The human element is removed, and the entire learning process is automated. In terms of network size, the number of input neurons would be 35, with 6 hidden neurons, and 1 output neuron. In theory, the minimum number of infected samples required for training would be at least 432. However, there would be no detrimental effects from training the network with higher samples, in order to reflect current virus numbers. CONCLUSIONS Neural computing is no longer seen as a pure academic subject. Indeed, many companies are now looking towards the use of neural networks as serious tools. Many systems are currently in use, with very high success rates. It has been found that it may be feasible to use neural computing technology in the virus detection field. However, at a low level the results are unclear. There seems to be greater accuracy using deterministic techniques. Using a neural network as a pre-/post-processing tool could offer a powerful addition to the virus expert's toolbag. Just one example is with the heuristic scanner. The authors believe other uses will also exist. ACKNOWLEDGMENTS As Bernard of Chartres said, echoed by Sir Isaac Newton:'If [we] have seen further, it is by standing on the shoulders of giants'. The assistance given by the following people is gratefully acknowledged: Jan Hruska, Frans Veldman, Alan Solomon, Martin Slade, Robert Mortimer and Michael Twist. The continuing support of the staff at Staffordshire University and at Visionsoft has also been gratefully appreciated. REFERENCES [Con94] 'Adopting The Neural Approach',Control Magazine,Issue 5, March/April 1994. [DTI95] UK Department Of Trade And Industry,Neural Computing Technology Programme,1995. [Gui91] Dr. Daniel Guinier,'Computer 'virus' identification by neural networks',SIGSAG,1991. [Mar93] Timothy Masters,'Practical Neural Networks in C++',Academic Press,1993. ISBN 0-12-479040-2 Further Reading. 'Neural Computing-an introduction',R Beale and T Jackson, IOP Publishing,1990. Vesseling Bontchev,Future Trends in Virus Writing,Proceedings of the Fourth International Virus Bulletin Conference,1994. Glenn Coates and David J. Leigh,'Virus Detection using a Generalised Virus Description Language',Proceedings of the Fourther International Virus Bulletin Conference,1994. ================================================================== ============= VB95_212.TXT ÄÄÄÄÄÄÄÄÄÄÄÄ SCANNERS OF THE YEAR 2000: HEURISTICS Dmitry O. Gryaznov S&S International Plc,Alton House,Gatehouse Way,Aylesbury,Buck,HP13 3XU,UK Tel +44 1296 318700 Fax +44 1296 318777 Email grdo@sands.co.uk INTRODUCTION At the beginning of 1994, the number of known MS-DOS viruses was estimated around 3,000. One year later, in January 1995, the number of viruses was estimated at about 6,000. By the time this paper was written (July 1995), the number of known viruses exceeded 7000. Several anti-virus experts expect this number to reach 10,000 by the end of the year 1995. This large number of viruses, which keeps growing fast, is known as the glut and it does cause problems to anti-virus software-especially to scanners. Today, scanners are the most frequently used type of anti-virus software. The fast-growing number of viruses means that scanners should be updated frequently enough to cover new viruses. Also, as the number of viruses grows, so does the size of the scanner or its database, and in some implementations the scanning speed suffers. It was always very tempting to find a final solution to the problem; to create a generic scanner which can detect new viruses automatically without the need to update its code and/or database. Unfortunately, as proven by Fred Cohen, the problem of distinguishing a virus from a non-virus program is algorithmically unsolvable as a general rule. Nevertheless, some generic detection is still possible, based on analysing a program for features typical or not typical of viruses. The set of features, possibly together with a set of rules, is known as heuristics. Today, more and more anti-virus software developers are looking towards heuristical analysis as at least a partial solution to the problem. Working at the Virus Lab, S&S International Plc, the author is also carrying out a research project on heuristic analysis. The article explains what heuristics are. Positive and negative heuristics are introduced and some practical heuristics are represented. Different approaches to a heuristical program analysis are discussed and the problem of false alarms is explained and discussed. Several well-known scanner employing heuristics are compared (without naming the scanners) both virus detection and false alarms rate. 1 WHY SCANNERS? If you are following computer virus-related publications, such as the proceedings of anti-virus conferences, magazine reviews, anti-virus software manufacturers' press releases, you read and hear mainly 'scanners,scanners, scanners'. The average user might even get the impression that there is no anti-virus software other than scanners. This is not true. There are other methods of fighting computer viruses-but they are not so popular or as well known as scanners; and anti-virus packages based on non-scanner technology do not sell well. Sometimes people who are trying to promote non-scanner based anti-virus software even come to the conclusion that there must be some kind of an international plot of popular anti-virus scanner producers. Why is this? Let us briefly discuss existing types of anti-virus software. Those interested in more detailed dicussion and comparison of different types of anti-virus software can find it in [Bontchev1], for example. 1.1 SCANNERS So, what is a scanner? Simply put, a scanner is a program which searches files and disk sectors for byte sequences specific to this or that known virus. Those byte sequences are often called virus signatures. There are many different ways to implement a scanning technique; from the so-called 'dumb' or 'grunt' scanning of the whole file, to sophisticated virus-specific methods of deciding which particular part of the file should be compared to a virus signature. Nevertheless, one thing is common to all scanners: they detect only known viruses. That is, viruses which were disassembled or analysed and from which virus signatures unique to a specific virus were selected. In most cases, a scanner cannot detect a brand new virus until the virus is passed to the scanner developer, who then extracts an appropriate virus signature and updates the scanner. This all takes time-and new viruses appear virtually every day. This means that scanners have to be updated frequently to provide adequate anti-virus protection. A version of a scanner which was very good six months ago might be no good today if you have been hit by just one of the several thousand new viruses which have appeared since that version was released. So, are there any other ways to detect viruses? Are there any other anti-virus programs which do not depend so heavily on certain virus signatures and thus might be able to detect even new viruses? The answer is yes, there are: integrity checkers and behaviour blockers(monitors). These types of anti-virus software are almost as old as scanners, and have been known to specialists for ages. Why then are they not used as widely as scanners? 1.2 BEHAVIOUR BLOCKERS A behaviour blocker (or a monitor) is a memory-resident (TSR) program which monitors system activity and looks for virus-like behaviour. In order to replicate, a virus needs to create a copy of itself. Msot often, viruses modify existing executable files to achieve this. So, in most cases, behaviour blockers try to intercept system requests which lead to modifying executable files. When such a suspicious request is intercepted, a behaviour blocker, typically, alerts a user and, based on the user's decision, can prohibit such a request from being executed. This way, a behaviour blocker does not depend on detailed analysis of a particular virus. Unlike a scanner, a behaviour blocker does not need to know what a new virus looks like to catch it. Unfortunately, it is not that easy to block all the virus activity. Some viruses use very effective and sophisticated techniques, such as tunnelling, to bypass behaviour blockers. Even worse, some legitimate programs use virus-like methods which would trigger a behaviour blocker. For example, an install or setup utility is often modifying executable files. So, when a behaviour blocker is triggered by such a utility, it's up to the user to decide whether it is a virus or not-and this is often a tough choice: you would not assume that all users are anti-virus experts, would you? But even an ideal behaviour blocker (there is no such thing in our real world, min you!), which never triggers on a legitimate program and never misses a real virus, still has a major flaw. To enable a behaviour blocker to detect a virus, the virus must be run on a computer. Not to mention the fact that virutally any user would reject the very idea of running a virus on his/her computer, by the time a behaviour blocker catches the virus attempting to modify executable files, the virus could have triggered and destroyed some of you valuable data files, for example. 1.3 INTEGRITY CHECKERS An integrity checker is a program which should be run periodically (say, once a day) to detect all the changes made to your files and disks. This means that, when an integrity checker is first installed on your system, you need to run it to create a database of all the files on your system. During subsequent runs, the integrity checker compares files on your system to the data stored in the database, and detects any changes made to the files. Since all viruses modify either files or system areas of disks in order to replicate, a good integrity checker should be able to spot such changes and alert the user. Unlike a behaviour blocker, it is much more difficult for a virus to bypass an integrity checker, provided you run your integrity checker in a virus clean environment-e.g. having booted your PC from a known virus-free system diskette. But again, as in the case of behaviour blockers, there are many possible situations when the user's expertise is necessary to decide whether changes detected are the result of virus activity. Again, if you run an install or setup utility, this normally results in modifications to your files which can trigger an integrity checker. That is, every time you install new software on your system, you have to tell your integrity checker to register these new files in its database. Also, there is a special type of virus, aimed specifically at integrity checkers-so-called slow infectors. A slow infector only infects objects which are about to be modified anyway; e.g. as a new file being created by a compiler. An integrity checker will add this new file to its database to watch its further changes. But in the case of a slow infector, the file added to the database is infected already! Even if integrity checkers were free of the above drawbacks, there still would be a major flaw. That is, an integrity checks can alert you only after a virus has run and modified your files. As in the example given while discussing behaviour blockers, this might be well too late. 1.4 THAT'S WHY SCANNERS! So, the main drawbacks of both behaviour blockers and integrity checkers, which prevent them from being widely used by an average user, are: 1. Both behaviour blockers and integrity checkers, by their very nature, can detect a virus only after you have run an infected program on your computer, and the virus has started its replication routine. By this time it might be too late-many viruses can trigger and switch to destructive mode before they make any attempts to replicate. It's somewhat like deciding to find out whether these beautiful yet unknown berries are poisonous by eating them and watching the results. Gosh! You would be lucky to get away with just dyspepsia! 2. Often enough, the burden to decide whether it is a virus or not is transferred to the user. It's as if your doctor leaves you to decide whether your dyspepsia is simply because the berries were not ripe enough, or it is the first sign of deadly poisoning, and you'll be dead in few hours if you don't take an antidote immediately. Though choice! On the contrary, a scanner can and should be used to detect viruses before an infected program has a change to be executed. That is, by scanning the incoming software prior to installing it on your system, a scanner tells you whether it is safe to proceed with the installation. Continuing our berries analogy, it's like having a portable automated poisonous plants detector, which quickly checks the berries against its database of known plants, and tells you whether or not its safe to eat the berries. But what if the berries are not in the database of your portable detector? What if it is a brand new species? What if a software package you are about to install is infected with a new, very dangerous virus unknown to your scanner? Relying on your scanner only, you might find yourself in big trouble. This is where behaviour blockers and integrity checkers might be helpful. It's still better to detect the virus while it's trying to infect your system, or even after it has infected but before it destroys your valuable data. So, the best anti-virus strategy would include all three types of anti-virus software: - a scanner to ensure the new software is free of at least known viruses before you run the software - a behaviour blocker to catch the virus while it is trying to infect your system. - an integrity checker to detect infected files after the virus has propagated to your system but not yet triggered. As you can see, the scanners are the first and the most simply implemented line of anti-virus defence. Moreover,most people have scanners as the only line of defence. 2 WHY HEURISTICS? 2.1 GLUT PROBLEM As mentioned above, the main drawback of scanners is that they can detect only known computer viruses. Six or seven years ago, this was not a big deal. New viruses appeared rarely. Anti-virus researchers were literally hunting for new viruses, spending weeks and months tracking down rumours and random reports about a new virus to include its detection in their scanners. It was probably during these times that a most nasty computer virus-related myth was born that anti-virus people develop viruses themselves to force users to buy their products and profit this way. Some people believe this myth even today. Whenever I hear it, I can't help laughing hysterically. Nowadays with two to three hundred new viruses arriving montly, it would be a total waste of time and money for anti-virus manufacturers to develop viruses. Why should they bother if new viruses arrive in dozens virtually daily, completely free of charge? There were about 3,000 known DOS viruses at the beginning of 1994. A year later, in January 1995, the number of viruses was estimated at least 5,000. Another six months later, in July 1995, the number exceeded 7,000. Many anti-virus experts expect the number of known DOS viruses to reach the 10,000 mark by the end of 1995. With this tremendous and still fast-growing number of viruses to fight, traditional virus signature scanning software is pushed to its limits [Skulason, Bontchev2]. While several years ago a scanner was often developed, updated and supported by a single person, today a team of a dozen skilled employers is only barely sufficient. With the increasing number of viruses, R&D and Quality Control time and resource requirements grow. Even monthly scanner updates are often late, by one month at least! Many formerly succesful anti-virus vendors are giving up and leaving the anti-virus battleground and market. The fast-growing number of viruses heavily affects scanners themselves. They become bigger, and sometimes slower. Just few years ago a 360Kb floppy would be enough to hold half a dozen popular scanners, leaving plenty of room for system files to make the diskette bootable. Today, an average good signature- based scanner alone would occupy at least a 720Kb floppy, leaving virtually no room for anything else. So, are we losing the war? I would say: not yet-but if we get stuck with just virus signature scanning, we will lose it sooner or later. Having realised this some time ago, anti-virus researchers started to look for more generic scanning techniques, known as heuristics. 2.1 WHAT ARE HEURISTICS? In the anti-virus area, heuristics are a set of rules which should be applied to a program to decide whether the program is likely to contain a virus or not. From the very beginning of the history of computer viruses different people started looking for an ultimate generic solution to the problem. Really, how does an anti-virus expert know that a program is a virus? It usually involves some kind of reverse engineering (most often disassembly) and reconstructing and understanding the virus' algorithm: what it does and how it does it. Having analysed hundreds and hundreds of computer viruses, it takes just few seconds for an experienced anti-virus researcher to recognise a virus, even it is a new one, and never seen before. It is almost a subconscious, automated process. Automated? Wait a minute! If it is an automated process, let's make a program to do it! Unfortunately (or rather, fortunately) the analytic capabilities of the human brain are far beyond those of a computer. As was proven by Fred Cohen [Cohen], it is impossible to construct an algorithm (e.g. a program) to distinguish a virus from a non-virus with 100 per cent reliability. Fortunately, this does not rule out a possibility of 90 or even 99 per cent reliability. The remaining one per cent, we hope to be able to solve using our traditional virus signatures scanning technique. Anyway, it's worth trying. 2.2 SIMPLE HEURISTICS So, how do they do it? How does an anti-virus expert recognise a virus? Let us consider the simplest case: a parasitic non-resident appending COM file infector. Something like Vienna, but even more primitive. Such a virus appends its code to the end of an infected program, stores (usually just three) first byte of the victim file in the virus body and replaces those bytes with a code to pass control to the virus code. When the infected program is executed, the virus takes control. First, it restores the original victim's bytes in its memory image. It then starts looking for other COM files. When found, the file is opened in Read_and_Write mode;then the virus reads the first few bytes of the file and writes itself to the end of the file. So, a primitive set of heuristical rules for a virus of this kind would be: 1. The program immediately passes control close to the end of itself 2. It modifies some bytes at the beginning of its copy in memory 3. Then it starts looking for executable files on a disk 4. When found, a file is opened 5. Some data is read from the file 6. Some data is written to the end of file. Each of the above rules has a corresponding sequence in binary machine code or assembler language. In general, if you look at such a virus under DEBUG, the favourite tool of anti-virus researchers, it is usually represented in a code similar to this: START: ; Start of the infected program JMP VIRUSCODE ; Rule 1: the control is passed ; to the virus body ; VIRUS: ; Virus body starts here SAVED: ; Saved original bytes of the victim's code MASK: DB '*.COM',0 ; Search mask VIRUSCODE: ; Start of the virus code MOV DI,OFFSET START ; Rule 2: the virus restores MOV SI,OFFSET SAVED ; victim's code MOVSW ; in memory MOVSB MOV DX,OFFSET MASK ; Rule 3: the virus MOV AH,4EH ; looks for other INT 21H ; programs to infect MOV AX, 3D02H ; Rule 4: the virus opens a file INT 21H ; MOV DX,OFFSET SAVED ; Rule 5: first bytes of a file MOV AH,3FH ; are read to the virus INT 21H ; body MOV DX,OFFSET VIRUS ; Rule 6: the virus writes itself MOV AH,40H ; to the file INT 21H When an anti-virus expert sees such code, it is immediately obvious that this is a virus. So, our heuristical program should be able to disassemble a binary machine-language code in a similar manner to DEBUG, and to analyse it, look for particular code patterns in a manner similar to an anti-virus expert. In the simplest cases, such as the one above, a set of simple wildcard signature string matching would do for the analysis. In this case, the analysis itself is simply checking whether the program in question satisfies rules 1 through 6;in other words, whether the program contains pieces of code corresponding to each of the rules. In a more general case, there are many different ways to represent one and the same algorithm in machine code. Polymorphic viruses, for example, do this all the time. So, a heuristic scanner must use many clever methods, rather than simple pattern-matching techniques. Those methods may involve statistical code analysis, partial code interpretation, and even CPU emulation, especially to decrypt self-encrypted viruses: but you would be surprised to know how many real life viruses would be detected by the above six simple heuristics alone! Unfortunately, some non-virus programs would be 'detected' too. 2.3 FALSE ALARMS PROBLEM Strictly speaking, heuristics do not detect viruses. As behaviour blockers, heuristics are looking for virus-like behaviour. Moreover, unlike the behaviour blockers, heuristics can detect not the behaviour itself, but just potential ability to perform this or that action. Indeed, the fact that a program contains a certain piece of code does not necessarily mean that this piece of code is ever executed. The problem of discovering whether this or that code in a program ever gets ontrol is known in the theory of algorithms as the Halting Problem, and is in general unsolvable. This issue was the basis of Fred Cohen's proof of the impossibility of writing a perfect virus detector. For example, some scanners contain pieces of virus code as the signatures for which to scan. Those pieces might correspond to each and every one of the above rules. But they are never executed-the scanner uses them just as its static data. Since, in general, there is no way for heuristics to decide whether these code pieces are ever executed or not, this can (and sometimes does) cause false alarms. A false alarm is when an anti-virus product reports a virus in a program, which in fact does not contain any viruses at all. Different types of false alarms, as well as most widespread causes of false alarms, are described in [Solomon] for example. A false alarm might be even more costly than an actual virus infection. We all keep saying to users:'The main thing to remember when you think you've got a virus-do not panic!' Unfortunately, this does not work well. The average user will panic. And the user panics even more if the anti-virus software is unsure itself whether it is a virus or not. In the case, say, where a scanner definitely detects a virus, the scanner is usually able to detect all infected programs, and to remove the virus. At this point, the panic is usually over;but if it is a false alarm, the scanner will not be able to remove the virus, and most likely will report something like:'This file seems to have a virus',naming just a single file as infected. This is when the user really starts to panic. 'It must be a new virus!'-the user thinks. 'What do I do?!' As a result, the user well might format his/her hard disk, causing himself a far worse disaster than a virus could. Formatting the hard disk is an unnecessary and un-justified act, by the way;even more so as there are many viruses which would survive this act, unlike legitimate software and data stored on the disk. Another problem a false alarm can (and did) cause is negative impact on a software manufacturing company. If an anti-virus software falsely detects a virus in a new software package, the users will stop buying the package and the software developer will suffer not only profit losses, but also a loss of reputation. Even if it was later made known that it was a false alarm, too many people would think:'There is no smoke without fire', and would treat the software with suspicion. This affects the anti-virus vendor as well. There has already been a case where an anti-virus vendor was sued by a software company whose anti-virus protection mistakenly reported a virus. In a corporate environment, when a virus is reported by anti-virus software, whether it is a false alarm or not, the normal flow of operation is interrupted. It takes at best several hours to contact the anti-virus technical support and to ensure it was a false alarm before normal operation is resumed-and, as we all know,time is money. In the case of a big company, time is big money. So, it is not at all surprising that, when asked what level of alarms is acceptable (10 per cent? 1 per cent? 0.1 per cent?), corporate customers answer:'Zero per cent! We do not want any false alarms!' As previously explained, by its very nature heuristic analysis is more prone to false alarms than traditional scanning methods. Indeed, not only viruses but many scanners as well would satisfy the six rules we used as an example: a scanner does look for executable files, opens them, reads some data and even writes something back when removing a virus from a file. Can anything be done to avoid triggering a false positive on a scanner? Let's again turn to the experience of a human anti-virus expert. How does one know that this is a scanner, and not a virus? Well, this is more complicated than the above example of a primitive virus. Still, there are some general rules too. For example, if a program relies heavily on its parameters or involves an extensive dialogue with the user, it is highly unlikely that the program is a virus. Thsi leads us to the idea of negative heuristics;that is, a set of rules which are true for a non-virus program. Then, while analysing a program, our heuristics should estimate the probability of the program to be a virus using both positive heuristics, such as the above six rules, and negative heuristics, typical for non-virus programs and rarely used by real viruses. If a program satisfies all our six positive rules, but also expects some command-line parameters and uses an extensive user dialogue as well, we would not call it a virus. So far so good. Looks like we found a solution to the virus glut problem, right? Not really! Unfortunately, not all virus writers are stupid. Some are also well aware of heuristic analysis, and some of their viruses are written in a way which avoids the most obvious positive heuristics. On the other hand these viruses include otherwise useless pieces of code, the only aim of which is to trigger the most obvious negative heuristics, so that such a virus does not draw the attention of a heuristical analyser. 2.4 VIRUS DETECTION VS. FALSE ALARMS TRADE-OFF Each heuristic scanner developer sooner or later comes to the point when it is necessary to make a decision:'Do I detect more viruses, or do I cause less false alarms?' The best way to decide would be to ask users what do they prefer. Unfortunately, the users' answer is:'I want it all! 100 per cent detection rate and no false alarms!' As mentioned above, this cannot be achieved. So, a virus detection versus false alarms trade-off problem must be decided by the developer. It is very tempting to build the heuristic analyser to detect almost all viruses, despite false alarms. After all, reviewers and evaluators who publish their tests results in magazines read by thousands of users world-wide,are testing just the detection rate. It is much more difficult to run a good false alarms test: there are gigabytes and gigabytes of non-virus software in the world, far more than there are viruses;and it is more difficult to get hold of all this software and to keep it for your tests. 'Not enough disk space' is only one of the problems. So, let's forget false alarms and negative heuristics and call a virus each and every program which happens to satisfy just some of our positive heuristics. This way we shall score top most points in the reviews. But what about the users? They normally run scanners not on a virus collection but on a clean disks. Thus, they won't notice our almost perfect detection rate, but are very likely to notice our not-that-perfect false alarms rate. Tough choice. That's why some developers have at least two modes of operation for their heuristical scanners. The default is the so-called 'norma' or 'low sensitivity' mode, when both positive and negative heuristics are used and a program needs to trigger enough positive heuristics to be reported as a virus. In this mode, a scanner is less prone to false alarms, but its detection rate might be far below what is claimed in its documentation or advertisement. The often-used (in advertising) figures of 'more than 90 per cent' virus detection rate by heuristic analyser refer to the second mode of operation, which is often called 'high sensitivity' or 'paranoid' mode. It is really a paranoid mode: in this mode, negative heuristics are usually discarded, and the scanner reports as a possible virus any program which happens to trigger just one or two positive heuristics. In this mode, a scanner can indeed detect 90 per cent of viruses, but it also produces hundreds and hundreds of false alarms, making the 'paranoid' mode useless and even harmful for real-life everyday use, but still very helpful when it comes to a comparative virus detection test. Some scanners have a special command-line option to switch the paranoid mode on; some others switch to it automatically whenever they detect a virus in the normal low sensitivity mode. Althought the latter approach seems to be a smart one, it takes just a single false alarm out of many thousands of programs on a network file server to produce an avalanche of false virus reports. 2.5 HOW IT ALL WORKS IN PRACTICE: DIFFERENT SCANNERS COMPARED Being myself an anti-virus researcher and working for a leading anti-virus manufacturer, I have developed a heuristic analyser of my own. And of course, I could not resist comparing it to other existing heuristic scanners. We believe the results will be interesting to other people. They underscore what was said about both virus detection and false alarms rates. As the products tested are our competitors, we decided not to publish their names in the test results. So, only FindVirus of Dr Solomon's AntiVirus Toolkit is called by its real name. All the other scanners are referred to with letters: Scanner_A, Scanner_B, Scanner_C and Scanner_D. The latest versions of the scanners available at the time of the test were used. For FindVirus, it was version 7.50-the first version to employ a heuristic analyser. Each scanner tested was run in heuristics-only mode, with normal virus signature scanning disabled. This was achieved by either using a special command-line option, where available, or using a special empty virus signature database in other cases. The test consisted of two parts: virus detection rate and false alarms rate. For the virus detection rate S&S International Plc ONE OF EACH virus collection was used, containing more than 7,000 samples of about 6,500 different known DOS viruses. For the false alarms test the shareware and freeware software collection of SIMTEL20 CD-ROM (fully unpacked), all utilities from different version of MS-DOS, IBM DOS, PC-DOS and other known files were used (current basic S&S fase alarms test set). When measuring false alarms and virus detection rate, all files reported were counted;reported either as 'Infected' or 'Suspicious'. Separate figures for the two categories are given where applicable. In both parts of the test, the products were run in two heuristic sensitivity modes, where applicable: normal or low sensitivity mode, and paranoid or high sensitivity mode. The automatic heuristic sensitivity adjustment was prohibited, where applicable. The results of the tests are as follows: Virus Detection Test Files Files triggered(infected+suspicious) scanned Normal Paranoid FindVirus 7375 5902 (N/A) 80.02% N/A Scanner_D 7375 5743 (0+5743) 77.87% 6182 (0+6182) 83.54% Scanner_C 7375 5692 (0+5692) 77.18% N/A Scanner_A 7375 4250 (N/A) 57.63% 6491 (N/A) 87.74% Scanner_B 7392(*) 3863 (2995+868) 52.38% 6124 (2992+3112)82.68% (*) Scanner_B was tested couple of days later, when 17 more infected files were added to the collection. False Alarms Test Files Files triggered(infected+suspicious) scanned(*) Normal Paranoid FindVirus 13603 0 (N/A) 0.000% N/A Scanner_A 13428 11 (N/A) 0.082% 371 (N/A) 2.746% Scanner_B 13471 17 (0+17) 0.126% 382 (0+382) 2.836% Scanner_D 13840 24 (0+24) 0.173% 254 (0+254) 1.824% Scanner_C 13603 28 (0+28) 0.206% N/A 3 WHY 'OF THE YEAR 2000'? Well, first of all simply because I could not resist the temptation of splitting the name of the paper into three questions and using them as the titles of the main sections of his presentation. I thought it was funny. Maybe I have a weird sense of humour. Who knows... On the other hand, the year 2000 is very attractive by itself. Most people consider it a distinctive milestone in all aspects of human civilisation. This usually happens to the years ending with double zero;still more to the end of a millennium, with its triple zero at the end. The anti-virus arena is not an exclusion. For example, during the EICAR'94 conference there were two panel sessions discussing 'Viruses of the year 2000' and 'Scanners of the year 2000' respectively. The general conclusion made by a panel of well-known anti-virus researcher was that, at the current pace of new virus creation by the year 2000, we well might face dozens (if not hundreds of thousands) of known DOS viruses. As I tried to explain in the second section of this paper (and other authors explained elsewhere [Skulason, Bontchev2], this might be far too much for a current standard scanners' technique, based on known virus signature scanning. More generic anti-virus tools, such as behaviour blockers and integrity checkers, whil being less vulnerable to the growing number of viruses and the rate at which the new viruses appear, can detect a virus only when it is already running on a computer or even only after the virus has run and infected other programs. In many cases, the risk of allowing a virus to run on your computer is just not affordable. Using a heuristic scanner, on the other hand, allows detection of most of new viruses with a regular scanner safe manner: before an infected program is copied to your system and executed. And very much like behaviour blockers and integrity checkers, a heuristic scanner is much more generic than a signature scanner, requires much rare updates, and provides an instant response to a new virus. Those 15-20 per cent of viruses which a heuristic scanner cannot detect could be dealt with using current well-developed signature scanning techniques. This will effective decrease the virus glut problem five fold, at least. Yet another reason for choosing the year 2000 and not, say, 2005 is that I have strong doubts whether the current computer virus situation will survive the year 2000 by more than a couple of years. With new operating systems and environments appearing (Windows NT, Windows'95, etc.) I believe DOS is doomed. So are DOS viruses. So is the modern anti-virus industry. This does not mean viruses are not possible for new operating systems and platforms-they are possible in virtually any operating environment. We are aware of viruses for Windows,OS/2,Apple DOS and even UNIX. But to create viruses for these operating systems, as well as for Windows NT and Windows'95, it requires much more skill, knowledge, effort and time than for the virus-friendly DOS. Moreover, it will be much more difficult for a virus to replicate under these operating systems. They are far more secure than DOS, if it is possible to talk about DOS security at all. Thus, there will be far fewer virus writers and they will be capable of writing far fewer viruses. The viruses will not propagate fast and far enough to represent a major problem. Subsequently, there will be no virus glut problem. Regrettably, there will be a much smaller anti-virus market, and most of today's anti-virus experts will have to find another occupation... But until then, DOS lives, and anti-virus developers still have a lot of work to do! REFERENCES [Bontchev1] Vesselin Bontchev,'Possible Virus Attacks Against Integrity Programs And How To Prevent Them',Proc.2nd Int. Virus Bulleting Conf.,September 1992,pp.131-141. [Skulason] Fridrik Skulason,'The Virus Glut. The Impact Of The Virus Flood',Proc.4th EICAR Conf.,November 1994,pp.143-147. [Bontchev2] Vesselin Bontchev,'Future Trends In Virus Writing',Proc.4th Int. Virus Bulletin Conf.,September 1994,pp.65-81 [Cohen] Fred Cohen,'Computer Viruses-Theory and Experiments',Computer Security: A Global Challenge,Elsevier Science Publishers B. V. (North Holland),1984,pp.143-158. [Solomon] Alan Solomon,'False Alarms',Virus News International, February 1993,pp.50-52. ================================================================== ============= VB95_LT1.TXT ÄÄÄÄÄÄÄÄÄÄÄÄ "Hey, Frisk. Be easy on me, please!" ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ - an IR-member THE EVOLUTION OF POLYMORPHIC VIRUSES Fridrik Skulason Frisk Software International, PO BOX 7180, 127 Reykjavik, Iceland Tel +354 5 617273 Fax +354 5 617274 Email frisk@complex.is The most interesting recent development in the area of polymorphic viruses is how limited their development actually is. This does not mean that there are no new polymorphic viruses, far from it-new ones are appearing constantly, but there is nothing 'new' about them-they are just variations on old and well-known themes. However, looking at the evolution of polymorphic viruses alone only shows half of the picture-it is necessary to consider the development of polymorphic virus detection as well. More complex polymorphic viruses have driven the development of more advanced detection methods, which in turn have resulted in the development of new polymorphic techniques. Before looking at those developments that can be seen, it is perhaps proper to consider some basic issues regarding polymorphic viruses, starting with the question of why they are written. That question is easy to answer-they are written primarily for the purpose of defeating one particular class of anti-virus product-the scanners. Considering virus scanners are the most popular type of anti-virus program, it is not surprising that they are the subject of attacks. At this point it is worth nothing that polymorphic viruses pose no special problems to a different class of anti-virus product, namely integrity checkers. This does not mean that integrity checkers should be considered superior to scanners-after all there is another class of viruses, the 'slow' viruses, which are easily detected by scanners, but which are a real problem for integrity checkers. Fortunately, polymorphic slow viruses are not common at the moment. As a side note 'slow polymorphic' viruses also exist, and should not be confused with 'polymorphic slow' viruses. This category will be described at the end of this paper, together with some other 'nasty' tricks. Considering how virus scanners work, a virus author can in principle attack them in two different ways-either by infecting an object the scanner does not scan, or by making the detection of the virus so difficult that the scanner, or rather the producers of the scanner may not be able to cope with it. Polymorphic viruses attempt to make detection difficult-either too time consuming to be feasible, or beyond the technical capabilities of the anti-virus authors. The success of virus authors depends not only on their programming skills, but also on the detection techniques used. Before describing the current techniques, however, a brief classification of polymorphic viruses is in order. Polymorphic viruses are currently divided into three groups: 1) Encrypted, with variable decryptors. This is the largest and currently the most important group. Several methods to implement the variability are discussed below, but most of them should be familiar to readers of this paper. 2) 'Block-swapping' Viruses. Only a handful of viruses currently belong to this group, but they demonstrate that a polymorphic virus does not have to be encrypted. These viruses are composed of multiple blocks of code, theoretically as small as two instructions, that can be swapped around in any order, making the use of normal search strings nearly impossible. 3) Self-modifying viruses using instruction replacement techniques. This is where the virus may modify itself by replacing one or more instructions in itself with one or more functionally equivalent instruction when it replicates. So far this category is only a theoretical possibility, as no viruses have yet been written that use this technique. It is possible that some such viruses will appear in the future, perhaps only to written to demonstrate that it can indeed be done. Considering that the viruses that currently fall into the second group are easy to detect using ordinary search strings, and that the third group is non-existent, the only polymorphic viruses currently of interest are encrypted ones. For that reason the term 'polymorphic viruses' should, in the rest of this paper, really be understood to mean only viruses of the first group, that is, encrypted with variable decryptors. So, how are those viruses detected? Basically the detection methods fall into two classes-those that detect and identify only the decryptor and those that look 'below' the decryptor, detecting the actual virus. This is not a strict 'either-or' classification- a scanner may analyse the decryption loop to determine that it might have been generated by a particular virus, before spending time decrypting the code. DECRYPTION-LOOP DETECTORS There are several different methods that have been used to detect and identify decryption loops-which used to be the standard way of detecting polymorphic viruses-but there are several significant problems with these methods. The most common methods are described later, but if they are only used as the first step, and the virus then properly decrypted some of the following problems disappear: - Virus-specific. Basically, the detection of one polymorphic virus does not make it any easier to detect another. - More likely to cause false positives. As we get more and more polymorphic viruses, capable of producing an ever-increasing variety of decryptors, the chances of generating a false positive increase, as some innocent code may happen to look just like a possible decryptor. - Identification is difficult. Many polymorphic viruses will generate similar decryptors, and it is entirely possible that a scanner will mis-identify a decryptor generated by one polymorphic virus as having been produced by another, unrelated virus. Also, in the case of variants of the same polymorphic virus, it may be possible to determine the family, but not the variant. - No disinfection. Virus disinfection requires the retrieval of a few critical bytes from the original host file that are stored usually within the encrypted part of polymorphic viruses. This means that virus- specific disinfection is generally not possible, at it would require decrypting the virus. On the positive side, detection of a particular decryptor may be quite easy to add, althought that depends on the design of the scanner and the complexity of the virus. The decryption techniques are old, and several anti-virus producers have abandoned them, in favour of more advanced methods. The most common detection methods in this group are: - Search strings containing simple wildcards - Search strings containing variable-length wildcards - Multiple search strings - Instruction usage recognition - Statistical analysis - Various algorithmic detection methods SEARCHING STRING CONTAINING SIMPLE WILDCARDS The limitation of this method are obvious, as it can only handle a few 'not very polymorphic' viruses, which are sometimes called 'oligomorphic'. They may for example make use of a simple decryption loop, with a single variable instruction. The last variable polymorphic virus uses two different instruction, NEG and NOT, which differ by only one bit. Defeating this detection method is easy: just insert a random number of 'junk' instructions at variable places in the code. 'Junk' does not have to mean 'invalid', but rather any instruction that can be inserted in the decryption loop without having an effect. Typical examples include NOP, JMP $+2, MOV AX,AX and other similar 'do nothing' instructions. SEARCH STRINGS CONTAINING VARIABLE-LENGTH WILDCARDS This method takes care of decryptors that contain those junk instructions. However, there are two problems with this approach. Some scanners cannot use this method as their design does not allow variable-length wildcards, but that really does not matter, as the technique is very easy to defeat: just make the decryptor slightly more variable so that no single search string, even using a variable-length wildcard will match all instances of the decryptor. This can be done in several ways. - Changing register usage: For example the DI register might be used for indexing, instead of SI, or the decryption key might be stored in BX instead of AX - Changing the order of instructions: If the order of instructions does not matter, they can be freely swapped around. - Changing the encryption methods: Instead of using XOR, the virus author could just as well use ADD or SUB. MULTIPLE SEARCH STRINGS This is generally considered an obsolete technique, but many anti-virus producers used it back an 1990 when the Whale virus appeared. This virus could be reliably detected with a fairly large set of simple search strings. Today, however, most of them would probably use a different method. This detection method can easily be defeated by increasing the variability of the decryptor past the point where the number of search string required becomes unreasonably large. There are other cases where the multiple search string technique has been used. One anti-virus company had access to the actual samples of a particular polymorphic virus that were to be used in a comparative product review. Rather than admitting that they were not able to detect the virus, they seem to have added a few search string to detect those particular samples-and they did indeed score 100% in that test, although later examination revealed that they only detected 5% of the virus in question. INSTRUCTION USAGE RECOGNITION This method was developed to deal with Dark Avenger's Mutation engine. It basically involved assuming initially that all files are infected, then tracing through the decryptor, one instruction at a time. If an instruction is found that could not have been generated by a particular virus as a part of the decryptor, then the virus is not infected by that virus. If one reaches the end of the decryptor, still assuming that the file is infected, it is reported as such. There are two major ways to attack this technique, but the more obvious is to increase the number of possible instructions used in the decryptor. If a virus used every possible instruction in a decryptor, it simply could not be detected with this method without modifying it. The second method is more subtle, but it involves making it more difficult to determine when the end of the decryption loop has been reached. STATISTICAL ANALYSIS This method is generally not used, due to the unacceptably large risk of false positives. It basically involves statistical analysis of the number of certain in the decryptor. It works best with viruses that generate large decryptors, that uwe few and uncommon 'do-nothing' instructions. Other algorithmic detection methods are possible, and are frequently used. Sometimes they are inly used to quickly eliminate the possibility of a particular file being infected with a particular virus, for example: IF The file is an EXE-structure file AND The initial CS:IP value equals 0000:0000 THEN The file is not infected by Virus-X In other cases the algorithm provides detection, instead of negative detection: IF The file is a COM-structure file AND It is at least 5623 bytes long AND It starts with a JMP FAR to a location at least 1623 from the end of the file AND The first 10 instructions contain at least 5 instructions from the following set {AAD,NOP,CLI,CLD,STC} AND Within the first 100 bytes from the entry point there is an XOR [SI/DI/BX],AX instruction AND Within the first 200 bytes from the entry point there is a branch instruction that transfers control back to the XOR instruction described above THEN The file is infected with Virus-Y It should be obvious from this example that the rules can get complex, perhaps unreasonably complex, and obviously require significant work to implement. Also, in some instances it is just not possible to get a sufficient number of rules like this to ensure accurate detection, not even considering the rules the virus itself may use to determine if a file has already been infected as the number of false positives would be too high. At this point it is very important to bear in mind that, while false positives are a very serious problem for the anti-virus author, they do not matter at all to the virus author. A false positive just means that the virus will not infect one particular file it might otherwise have infected... so what-after all, it has plenty of other files to infect. Having looked at the detectors that only detect the decryption loop, we must look at the more advanced detectors, which detect the actual virus, instead of just the encryption loop. Compared to the decryptor-detecting methods, the following differences are obvious: - More generic. These methods require significantly more initial work, but the extra effort required to add detection of a new polymorphic virus is far less than with some of the other methods described above. - Less chances of false positives. Having decrypted the virus, it should be possible to reduce the chances of false positives almost down to zero, as the entire virus body should be available. - Identification is easy. When the virus has been decrypted, identification is no more difficult than in the case of non-encrypted viruses. - Easy disinfection. The sample applies to disinfection-it should not be any more difficult than if the virus had not been encrypted to begin with. There are two such techniques which have been used to detect polymorphic viruses. 'X-ray' Generic decryption The X-raying technique was probably only used in two products, both of which have mostly abandoned it by now. It basically involved assuming that a particular block of code had been encrypted with an unknown algorithm, and then deriving the encryption method and the encryption keys from a comparison between the original and encrypted code. As this sounds somewhat complicated, an example is in order: Assume that manual decryption of one virus sample reveals that a particular block of code should contain the following byte sequence: B8 63 25 B9 88 01 CD 21 The corresponding encrypted block of code in a different sample looks like this: 18 C4 8B 0C 34 C2 07 F0 Is there any way this sequence could have been obtained from the first one by applying one or two primitive, reversible operations like for example: XOR with a constant ADD/SUB with a constant ROL/ROR a fixed number of bytes Yes, because XORing the two sequences together generates the sequence: A0 A7 AE B5 BC C3 CA D1 Calculating the differences between the bytes in that sequence give the following result: 07 07 07 07 07 07 07 07 which shows that the original sequence, and (presumably) the entire virus body can be obtained by XORing each byte with a key, and then adding the constant value of 7 to that key, before applying it to the next byte. Using this method, it may be possible to deduce the operation of the decryptor, without looking at it at all. There is a variant of the X-ray method which has been developed by Eugene Kaspersky, which works in a different way, but produces the same result. The reason 'X-raying' has mostly been abandoned is that it can easily be defeated, for example by using an operation the X-ray procedure may not be able to handle, by using three or more operations on each decrypted byte or by using multiple layer of encryption. The last method to be developed does not suffer from that limitation, and can handle decryptors of almost any complexity. It basically involves using the decryptor of the virus to decrypt the virus body, either by emulating it, or by single-stepping through it in a controlled way so the virus does not gain control of the execution. Unfortunately, there are several problem with this method: - Which processor should be emulated? It is perfectly possible to write a virus that only works properly on one particular processor, such as a Cyrix 486 SLC, but the decryptor will just generate garbage if executed on any other processor. An intelligent emulator may be able to deal with this, but not the 'single-stepping' method. - Single-stepping is dangerous-what if the virus author is able to exploit some obscure loophole, which allows the virus to gain control. In this case, just scanning an infected file would result in the virus activating, spreading and possible causing damage, which is totally unacceptable. It should be noted that a very similar situation has actually happened once-however the details will not be discussed here. - Emulation is slow-if the user has to wait a long time while the scanner emulates harmless programs, the scanner will probably be discarded, and obviously a scanner that is not used will not find any viruses. - If the virus decryptor goes into an infinite loop and hangs when run, the generic decryptor might do so too. This should not happen, but one product has (or used to have) this problem. - How does the generic decryptor determine when to stop decrypting code, and not waste unacceptable amount of time attempting to decrypt normal, innocent programs? - What if the decryptor includes code intended to determine if it is being emulated or run normally, such as a polymorphic timing loop, and only encrypts itself if it is able to determine that it is running normally? - What if the decryptor is damaged, so that the virus does not execute normally? A scanner that only attempted to detect the decryptor might be able to do so, but a more advanced scanner that attempts to exploit the decryptor will not find anything. This is for example the case with one of the SMEG viruses-it will occasionally generate corrupted samples. They will not spread further, but should a scanner be expected to find them or not? Finally, it should be noted that there are other ways to make polymorphic viruses difficult than just attacking the various detection techniques as described above. 'Slow polymorphic' viruses are one such method. They are polymorphic, but all samples generated on the same machine will seem to have the same decryptor. This may mislead an anti-virus producer into attempting to detect the virus with a single search string, as if it was just a simple encrypted but not polymorphic virus. However, virus samples generated on a different machine, or on a different day of the week, or even under a different phase of the moon will have different encryptors, revealing that the virus is indeed polymorphic. Another recent phenomena has been the development of more 'normal-looking' polymorphic code. Placing a large number of 'do-nothing' instructions in the decryptor may be the easiest way to make the code look random, but it also makes it look really suspicious to an 'intelligent' scanner, and worthy of detailed study. If the code looks 'normal', for example by using harmless-looking 'get dos-version number' function calls, it becomes more difficult to find. So, where does this leave us? Currently anti-virus producers are able to keep up with the virus developers, but unfortunately the best methods available have certain problems-the one most obvious to users is that scanners are becoming slower. There is no indication that this will get any better, but on the other hand there are no signs that virus authors will be able to come up with new polymorphic techniques which require the development of a new generation of detectors. ================================================================== ============= VB95_LT3.TXT ÄÄÄÄÄÄÄÄÄÄÄÄ CATCHING THE VIRUS WRITERS Jim Bates - Computer Forensics Limited - U.K. JB> = Jim Bates XX> = Someone ;). INTRODUCTION JB> During my work in analysing virus code I have been privileged to be asked to help the Police on a number of occasions and some of my experiences and observations will be described here. XX> I shouldn't really see it as a priviledge to work for the police to bust virus writers or supply them with information they're not worthy of having. If some fucking computer-crime-investigation-agency by any chance gave me such a request because I analyzed virus code, and knew a lot about the virus-community I would've turn them down. Of course, that would probably be impossible looking at my situation today (they don't know who I am), but even if I'd been busted and if they'd been blackmailing mewith all kind of things I still wouldn't cooperate with them, and they would get no information whatsoever from me. It's unmoral to turn friends in, no matter what. Yeah, of course Jim wouldn't see viruswriters as friends, but even if he sees us as enemies, it's still very low. JB> In this presentation I shall try to highlight the problems that computer viruses have caused and how the authorities in the U.K. are dealing with them. I will describe some virus writers and the environment that they work in to produce their programs. I will introduce some of the reasons they have given for writing viruses and in some cases why they feel aggrieved at being "persecuted" by the authorities. Without going into too much detail that might help the really malicious virus writers, I will present details of some of the cases I have been involved in and how the Police tackled the problem of locating and identifying those responsible for particular virus incidents. XX> I don't think Jims little hints of being *really* malicious would help us very much anyways. I mean creativity is everything :) when it comes to being a pain in the ass. If you use your creativity in (general terms) negative way, that's a good start :). JB> WHY DO WE NEED TO CATCH VIRUS WRITERS? XX> It's simple, we don't! Maybe Jim got a perverted appeal to catch one, else there's no reason. Why would they waste police resources to catch a harmless individual who writes computer-programs? I mean, even a damn dangerous writer won't cause the society that very much harm. Go bust some rapists, murderer or some other kind of *real* criminal! JB> There are several reasons why we should want to catch these people-the main reason is quite simply to bring them to book for the loss and disruption that they cause. Another reason is that viruses are a non-productive threat which diverts genuine creative effort from helping to fuel to progress of computing in general. If we accept this then it becomes obvious that we should use any means at our disposal to stop people from writing them and distributing them. It should also be remembered that some at least of the virus writers display an obvious talent for programming and it is a sad lass to the industry that such skill is wasted. XX> Virus writing is for most writer's more a question about attitude than programming. Those people [us] want to write viruses, rather than useful software. If we look at it that way, there is no skill wasted.. And besides, technical minded people are making for example bombs, why wouldn't you rather stop them? It's a waste that highly skilled persons (Stop all Nuclear testing!) -rb are making for example nuclear bombs. JB> There is still a public perception that viruses are just a nuisance and only cause minor annoyance to large companies-"... who can easily afford it." This is just not true-there are documented instances where ordinary people have suffered serious loss and even life-threatening situations as a direct result of virus activity. XX> Well, viruses aren't disciminative, they hit everyone they can. If this means a large company with a lot of cash or a poor XT owner, well, that doesn't really matter for the virus (and not for virus-programmers either.. ). Oh well, some viruswriter's do discriminate by the way! Manzon only hit 80386 computers and above! JB> So, another vital reason for catching them is to deter others. XX> If you catch one swedish viruswriter, I sure as hell will write a lot more destructive code and blame the one catching him for giving me motivation. Rebellion? Perhaps. Right? Of course. Today you can see a few viruses dedicated to busted viruswriter's. Predator#2 for example was dedicated to the ARCV, hehe, just too bad some IR-members fell victim to it :-). (* Priest rox0rs! *) JB> VIRUSES BREAK THE LAW In the U.K. it is illegal to access or modify the contents of a computer without proper authority. It can therefore be argued that a computer virus, since it does not ask for permission to replicate, breaks the law simply by spreading. XX> Well, press lawsuits against the viruses then :-). However, it's not forbidden to create viruses in most countries and therefor he can't say that viruses break the law.. And besides, Stormbringer's GoodVirus did ask for permission to replicate! How to forbid that virus? Asshole! JB> If a virus is executed without the knowledge or permission of the computer owner and the author of that virus can be identified, he or she can be charged under Section 3 of the Computer Misuse Act (1990). This offence carries a maximum sentence of five years in prison and it is one of the few laws where the offence can be committed outside the U.K. XX> Five years? That's fucking overdoing it! To get five years in Sweden, well, even if you kill five persons, it's not sure you'll get over five years prison! Five years in prison!#!œ! Thing about it.. Five years for doing *nothing* wrong! The law is *really* screwed up! BIGTIME! JB> Thus it is quite possible for a virus writer to break the law in the U.K. without ever having set foot in the country. Extradition treaties will no doubt be updated by the time computers become obsolete. XX> Let's assume that Manzon infected some people in the U.K. (which it did), then who are to be held responsible for that? Red-A for writing it? An IR-member for deliberate making that Petra-remover which he infected, Btk for distributing it to a few swedish boards, or none of us? Well, I don't really care because if they'd try to press lawsuits against me (or anyone in IR for that matter), they would hardly succed in doing that. We couldn't know that the U.K. would become infected! JB> VIRUSES CAUSE LOSS, DISRUPTION AND DAMAGE By far the largest loss and disruption suffered by the victims of a virus outbreak is that arising from downtime while installations are checked and disinfected. A recent outbreak of the Pathogen virus in a college in England affected 4 file servers and 90 work stations. The college had to be closed for four days while the systems were checked and cleared. The commercial loss of such a shutdown can be imagined. Official complaints to the Police concerning outbreaks of the Pathogen and Queeg viruses (produced at the trial of the virus author) listed estimated losses approaching 500,000 pounds sterling. XX> I guess that's pretty much the same story as when a new variant of Digital Death which TU fixed closed the schools' computer-classes (and more) at a school for over one week. If that cost 500.000 pounds.. Hahahha! Blame their weak security, not TU! JB> Two other incidents will server to show just how serious virus outbreaks can be: 1) An old established small family bakery in south east England was hit by the Casino virus in January. This resulted in the partial loss of both their stock control and their accounts data. Two months later they were hit again, this time by the Michelangelo virus, with the total loss of all data. Backup and protection procedures implemented by a so-called expert after the first incident proved ineffective and very little data was recoverable. The company, employing nine people, went into receivership bankrupting the owner. He lost his business and his home. When I spoke to him, his wife was distraught-not knowing how she was going to care for her two little girls. XX> A very sad story, indeed. I guess they hate computer viruses now, but do however think they should hate the 'expert' rather than the virus writer. The wanna-be expert are to be held responsible for this. JB> 2) A local medical practice in the English Midlands maintained a computer system containing patient records. The system operated by printing a copy of the patient's record in the surgery when the patient arrived in the reception area. While one of the Doctors was on holiday a locum undertook his workload and prescribed a small dose of Penicillin for a regular patient. The computer system was infected with the Nomenklatura virus and some of the records had been corrupted with the result that this particular patient's record was shown incorrectly and did not indicate that the patient was strongly allergic to the drug. The database access system did not signal any errors when displaying or printing records and the locum was unfamiliar with the patient's allergy. These two facts, coupled with the effects of the computer virus meant that the patient who received Penicillin suffered a nasty and uncomfortable reaction. It does not take a long stretch of imagination to the life-threatening potential of such an incident. XX> Jim just listed two very sad stories about what viruses _could_ do. However, if a person died due to a virus, that would hardly be a deliberate attack to kill another person with his creation [the virus]. There is easier things to kill a man than with a virus. Accidents happends, you know. No further comments, you can be run over by a reindeer as well :-). Just ask Anna Jones. . . JB> VIRUSES ARE UNETHICAL Quite apart from the damage and nuisance that viruses cause;and the fact that many countries have now criminalised them-viruses are just not ethical. It is thoroughly mean and nasty to write computer programs designed to deliberately damage data. Every virus writer that I have met has admitted to the cowardly and craven nature of their activities. It somehow reflects upon all of us that human nature can sink so low as to transfer some of its own baser instincts for destruction into an environment which is arguably the most thrilling development since the invention of the wheel. XX> I don't think writing viruses is the act of a coward (We want some chicken tonight!). U.K. viruswriter does risk a lot of things by writing viruses. It doesn't take much guts to do it, but it takes brains to stay away from the NSY. And besides, most viruses aren't designed to deliberately damage data! They just add their code into host-files and replicate without being noticed. A few viruses does however do deliberately damage, but that's quite a few, so ah well :-). JB> There are those who will argue passionately for the freedom to write what they like on their own equipment and I for one am not seeking to prevent them. However, with the ever wider use of computers, the skill to make them do our bidding brings with it a pressing responsibility. There are arguments attempting to liken computer viruses to primitive life-forms. Some people actually believe this and do not respond favourably when it is pointed out to them that similar arguments have been advanced and defeated for crystalline replication or even the growth and spread of fire. In most cases however, these arguments are used simply as a retrospective excuse. XX> Sorta agreed. JB> THE FIELD OF PLAY When the virus problem first arrived and began to grow, several far-sighted individuals saw that some form of defence was going to become absolutely essential if the well-being of computing was to survive. These people could be divided into two groups depending on their original motivation-some rushed to the defence of computing just because it was the right thing to do;other simply saw an opportunity to make lots of money. Sadly, as the problem has grown over the years, the latter group seems to have grown at the expense of the former. We now seem to have a symbiotic relationship between virus writers and anti-virus companies-each feeds on the efforts of the other. It is interesting to speculate that if suddenly all virus writers were to stop their activities, some very large companies might suffer serious financial setbacks. The virus writer (apparently) make no money from their efforts, while the anti-virus companies trade in a market worth many millions each year. There have even been suggestions that some viruses originate from the anti-virus companies themselves. I should add that to date as far as I am aware, none of these suggestions have been substantiated. XX> I don't think AVers writes viruses themself, but it would though be quite cool to see viruses written by them. JB> So we have a situation where virus writers-who often take an anti- establishment stance-are actually feeding the coffers of the very people who they claim to despise. Add to this a few despicable individuals who-in the name of freedom of information-choose to collect and make publicly available who collections of virus code containing all the intricate technical details, and you have a thoroughly confused and tangled industry in which the opportunities for intrigue and deception are rife. In the midst of this shambles is the computer user-caught in the crossfire so to speak. The typical user neither needs nor desires to know how computers work-it is sufficient for him that they DO work. It is understandable therefore that when information built by their own efforts is destroyed by a deliberate indiscriminate act of malice, they should be saddened and angered. Bearing all this in mind, the authorities are right to legislate against the distribution of virus code and we should all try to help in the fight to bring them to justice. XX> Well, I don't expect every computer-user worldwide to know how to protect themself against viruses -- that is quite impossible. Not even me, can gurantee my computer to be total clean from viruses. However, let's assume I got my HD totally fucked up by a virus of a writer who lives no more than 200 meters from me.. Well, I wouldn't like to see him in jail anyways! Why would I bring him to justice? He didn't break the law by writing it, and what if he didn't distribute it himself? Hey presto! He cannot be held responsible for an accident. Shit happends, just too bad it happend too me :). JB> WHO ARE THE VIRUS WRITERS? XX> We are, of course :-). JB> My own observations, confirmed by other people similarly engaged in trying to track them down, are that virus writers are generally 'loners'. XX> ARCV was hardely loners, they were a group of technical-interested youth. JB> Most right thinking people reject the idea of indiscriminately damaging something, especially when it does not belong to you. For this reason a virus writer cannot discuss his hobby with friends or acquaintances. True he may be able to establish contact with like-minded people across computer communication links but even there he must maintain a protective anonymity and so he tends to be a loner. XX> Ah well, when I meet my IRL friends we won't discuss viruses, but some of them know about my interest in viruses and some even write viruses. but it can happen that we discuss computers, but mainly we spend our time together talking about parties and babes.. Confusing chats I tell you! :). JB> Working alone on something as technically challenging as computer virus code takes an enormous amount of time and concentration. The dedicated virus writer may therefore become obsessive and shun any other activities which may take him away from his obsession. This lack of social interaction makes them withdrawn and uncommunicative as well as leading to a general tendency to social inadequacy. XX> Hahahah! Well, I have met quite a few viruswriter's, and I would say that we are no more 'unnormal' than people you meet when you're out getting drunk.. I live a perfectly healthy social life, and most of my fellow writers do as well. The only thing that makes us *really* different to the average blonde-bitch is our IQ, our dicks, our knowledge in computers and low-level assembly language :-). (Well, not total true, but compared to most people, we ain't much different.. ). Jim Bates.. Come and visit me and go get some facts before writing up this typical bogus-shit. AV-people spend *way-more* time in front of their computer than most viruswriters does, and hey.. I've already written a lot of things already about this very topic, so just read that and thrust me on this one: You're wrong! I'm right! (hehe XX can't stop thinking about his dirty fantasies :P) -rb JB> The reasons they become attracted to virus writing in the first place seems to vary widely but are usually preceded by a technical curiosity and in some cases a fascination with the spurious argument that viruses may mimic a simple life-form. As they become more involved, they tend to lose touch with the real world and live almost entirely within the electronic environment that they work in a cyberspace that may seem more real to them than the normal humdrum of human existence. XX> Well, it's easy to get too involved in the VX-underground because it's a very fun place to kill some of your time (Boy, you should not kill time, pressure it, Sieze the day! Uerm, well, could you give me a dollar so I can get loaded??) in. However, once you spend time in front of IRC rather than seeing IRL-friends, getting drunk, or just fucking you girlfriend(s), think twice what you're doing :-)... Just combine it, and don't take the scene too damn serious :-) (words of wizdom, could you please email the money). JB> The inclusion of damaging payloads and intricate trigger mechanisms become just another technical exercise where they can demonstrate their superiority over the rest of us. XX> Not agreed :-). Payloads are trivial to write.. It's no technical challenge in writing a HD-trasher.. Every programmer worth his salt can make one :-). JB> Depending upon the degree of their social degeneration, their activies can be described as ranging from irresponsible and stupid to malicious and hateful. I have not yet come across a virus writer who simply wrote and did not distribute his programs. Distribution in the sense meaning loosing them into the computing community in a way that would ensure their survival and growth. XX> Well, I've written viruses totally motivated by hate (Xxxxx.XXXX), and well that was just another way to express myself.. If I hadn't written it, I sure as hell would have smashed a lot of things just to calm down. Is that better? Of course not! But that's exactly what other people are doing.. Smashing things, beating up to them totally unknown persons, raping girls, etc. Heck, it's a violent sociely we're living in, and viruswriting just isn't that bad.. JB> The intensive and technical nature of virus writing is such that they will usually work from home. A small proportion being their activities within open computing environments at school or college but the large majority either start at or graduate to, a semi-secret existence at home. When they are asked to give their own reasons for writing viruses they often have difficulty. XX> Let's talk about a person we all know :). All he wanted was to release an e-zine, and that it turned out to be a VX-zine wasn't the important thing. It just turned out that way becuase another computer guy he had got in contact with had a keen interest in viruses. Well, that's a reason. Curiousity another. JB> Remember that most of those that I have met have been under investigation precisely because their programs have caused damage at large. In these cases they are aware that their activites have had serious effects and there are probably serious consequences for them to face. They may attempt to justify themselves by suggestion that they were "only testing" or "researching viruses". One incredible argument went as follows:- The virus writer was asked why he wrote and distributed viruses-he replied that he "wanted to be a virus researcher-but no one would give me any so I thought I'd write my own." His interrogator observed, "it's a good job you weren't interested in brain surgery." XX> Well, then who's fault is this? The AV-people I tell you! For example, take the example why TridenT was being formed? (For those who doesn't know shits about TridenT, read the interview's in Insane Reality #4, or in Mark Ludwig's book -The Virus Creation Labs-). Wasn't it because John Tardy was denied source-codes from (among many) Frans Veldman - author of TBAV? JB> The latest reasoning, from the Black Baron, was that he used the viruses as a platform to test and advertise his polymorphic engine. He argued that the SMEG polymorphic code had potentially useful capabilities in copy protection. This highlights another of the most common arguments-that virus code might somehow be beneficial. In nine years of work in this field I have yet to see a demonstrably beneficial use for virus code but still they think that can be a justification for indiscriminate destruction. XX> Well, hard-ass encryption-utilities can be used in useful ways. If you don't like other people to look at or steal your code, encrypt it so the people who are too lazy to write their own code cannot decrypt it. So, imo The black Baron was correct, his program just wasn't any hard to break (one encryption manually, not making a program that breaks all encryptions automatically). JB> Whether you classify them as thoughtless or malicious, these people are criminals and must be caught and punished. The damage they are doing is incalculable. XX> Hey, I'm a unpunished writer of viruses, making me into a non-criminal! I never broke any swedish-law, so come kiss my arse! JB> SO HOW DO WE FIND THEM? Let us look at a possible sequence of events which might lead to catching a virus writer. XX> It ain't hard :). ("Q: How can you tell there is an old man in the dark?) Write a few groundbreaking viruses, go on #virus show 'em up and ask for membership in Phalcon/Skism or VLAD. Scene-secrets are traded quite open between groups and soon or later, you've figured out enough to caught as good as every group-affiliated (and some independent) writers. But what's stopping you is to write those damn viruses.. Asshole! If you want something done, put your crappy ethics aside, and do something for once.. Words won't help you know! :). Whoops! Just gave him *the hint* of how to catch us.. So, beware dudes! Hehe. JB> First someone suffers loss or damage of a sufficient magnitude to persuade them to make an official complaint. The complaint will need to contain details of the virus-including a sample-and details and costs of the damage or disruption suffered. This can be a problem since the first objectie of the victim is to destroy the virus and resume normal operations as soon as possible. In the Black Baron case I analysed in excess of 57 virus samples from various complainants. The most from any site was 9 samples but the majority sent only one or two. The presence of a "generation number" within the virus code made it possible to identify which samples had actually introduced the virus in some instances but a greater number of samples would have made the job of tracing the infection so much easier. It is this tracing process which is so difficult. Victims are rarely able to pinpoint exactly where the infection came from- particularly is the virus is designed to infect slowly and quietly. Once an origin has been established, enquiries can be made to try to trace further back along the chain. A major problem for virus writers is how to get their code into the computing community. Some may upload infected files to Bulletin Boards or Internet sites. Others may try physically passing infected programs around. Whichever method is chosen it is this initial distribution which is the most dangerous area for the virus writer. In the Black Baron case the Police followed up complaints received and were able to determine that initially infected software had been downloaded from various Bulletin Boards. With help from the BBS operators and the telephone company, activity logs and telephone billing records revealed the source of the original uploads. There were other considerations which need not be discussed here but the net result was that the Police knew exactly when and where the infected programs had come from. XX> That was quite ignorant from TBB. Ignorance ain't no good when being in this kinda business. Try tracing the origin of Petra-Rm.Zip.. (Silence.. ). JB> A number of complainants asked for complete confidentiality and this was respected because there were plenty of others willing to stand up in court. However, users must accept that if they want justics they must be prepared to make their complaints public. Once the originator of the infections has been detected and identified, the Police enquiries can focus in earnest and will eventually result in a search and seizure operation. A search warrant is issued and the suspect's home (and possible his workplace) will be visited and all computing equipment seized for investigation. The suspect is interviewed and questioned about the alleged offences and his equipment is examined in minute detail for any evidence linking him to viruses. In one case, the suspect became aware of Police interest and took measures to thoroughly disinfect his computers and his his virus source code. He also sent his machine for safe keeping to a friend, thinking that this left him safe. When the Police did arrive, no computing equipment of any kind was found and he denied any involvement in computing. Sine the Police were well aware of his recent activities-and could prove them-this denial simply confirmed that he had something to hide. The Police knew about the friend and had visited him too! Several computers were seized and the hidden source code was rapidly found and identified. Even without the source code charges would still have been brought, but finding it made the case much stronger. XX> Hint: Encrypt your HD and put all your floppy-discs in the microwave-oven if you are aware of the police are watching you! :). JB> HOW YOU CAN HELP XX> Hey! Send your name in to the NSY :-). .. . That's not too hard! JB> For the best chance of conviction, a series of events must be shown to link the virus writer with the damage. You as a potential victim should bear this in mind so that if you do get hit you can provide a solid start for the Police to investigate the chain of events. What new program or disk brought the infection in? Where did it come from-and when? What damage/disruption did it cause and how much did it cost you as a result? In the U.K. arrangements can be made to take detailed image copies of infected machines so that as much evidence as possible is preserved with the minimum of disruption to you. You can then proceed with the disinfection process and a rapid resumption of normal working. If you operate a Bulletin Board or Internet site, do you log incoming software and callers? Could you provide the evidence that the Police need? The quicker you can provide this information, especially in the case of hitherto unknown viruses, the better the chance of catching the perpetrator. If you are a virus writer-remember that you are not completely anonymous. There are big risks and when you are caught there are heavy consequences. As interesting as you find virus code-are you prepared to go to prison for it? XX> I know that I'm not completely anonymous. But since I am doing nothing wrong, (in the writing itself), I won't care. If Jim Bates wanna try something about (C)opyright-violations, he just gotta figure out who wrote this text :) and decided to include it.. Bah. Good luck asshole, you'll need it.. - (C) 1995 Someone, Anyone, Everyone & Noone INC.