Insane Reality issue #7 - (c)opyright 1995 Immortal Riot               File 005 


% Virus-Bulletin %

------------------

Here follow a few articles from The Virus-Bulletin. I have no clue of 
how this material got onto my harddrive, but I find it somehow 
interesting, and decided to publish it.



It's what I was told heavily copyrighted, so if someone get's offended 
seeing AV-material published in an VX-zine, go sue someone, but don't 
try anything against me. This zine doesn't have any "responsible 
publisher", so I cannot be found guilty for anything. Imo, information 
of any kind worth the effort being typed up, should be free and total 
and if possible released to the major public (such as included in a 
zine). If you still think I'm responsible for letting this material 
out, go figure out a way how to prove that XX really is me :-), or don't 
bother. Ok?



I didn't bother to comment and reply upon all articles included, but 
ah well, I don't care if you don't. There is, however one article 
I added a few comments in. It was written by Jim Bates and it's about 
how to catch a viruswriter ;), and why we should be caught. Hmm, maybe 
this is a good reason? Hehhe, well, don't try it son, you're not that 
good :-).



- XX

==================================================================
============== 


VB95_209.TXT

哪哪哪哪哪哪

HEURISTIC SCANNERS: ARTIFICIAL INTELLIGENCE?



Righard Zwienenberg

Computer Security Engineers, Postbus 85 502, NL-2508 CE Den Haag, 
The Netherlands

Tel +31 70 362 2269 Fax +31 70 365 2286 Email rizwi@csehost.knoware.nl 


Though not explicitly stated, heuristic anti-virus methods have been in use 
for almost as long as the virus threat has existed. In the 'old days', 
FluShot(+) was a very popular monitor, alerting the user when it detected 
'strange and dangerous' actions. This can be regarded as simple heuristic 
analysis, because FluShot did not know if the action was legitimate or not. 
It just warned the user.



During the last couple of years, several resident behaviour-blockers have 
been developed, used, and dismissed again. In most cases, the user finds 
warnings irritating, aggravating and incomprehensible. The only resident 
protection they normally use - if any - is a resident scanner. This makes 
life easier for the users, because the resident scanner clearly indicates 
that a file or disk is infected by a certain virus when it pops up its box. 
The disadvantage, which the user doesn't see, is that it does not detect 
new viruses.



Also, the less popular (but very important) Integrity Checkers may be 
regarded as heuristic tools. They warn the user when the contents of files 
have been changed, when files have grown in size, received new time and 
date stamps, etc. They often display a warning such as: 'file might be 
infected by an unknown virus' in the case of a changed executable. 
Especially in a development environment, Integrity Checkers can be really 
irritating. The user already knows that his executable has changed, because 
he just changed and recompiled the source code. But how is the Integrity 
Checker to know that? Using a list of executables to skip is not safe, 
because a virus may indeed have infected an executable on the list. In that 
case, the change was not caused by a recompilation. However, the integrity 
checker can't tell the difference!



Based on these early attempts, the first generation of scanners with minor 
heuristic capabilities were developed. The heuristics they used were very 
basic and usually generated warnings about peculiar file date and file time 
stamps, changes to file lengths, strange headers, etc. Some examples: 


        EXAMPLE1.    COM 12345 01-01-1995  12:02:62

        EXAMPLE2.    COM 12345 01-01-2095  12:01:36

        EXAMPLE3.    EXE Entry point at 0000:0001



The heuristics of the current, second, generation of scanners are much 
better. All the capabilities of the first generation scanners have obviously 
been retained, but may new heuristic principles have been added: code 
analysis, code tracing, strange opcodes, etc. For example: 


        0F  POP CS



Strange opcode-an 8086-only instruction!



        C70600019090    MOV WORD PTR [100],9090

        C606020190      MOV WORD PTR [102],90

        E9              JMP 0100



Tracing through the code shows that it jumps back to the entrypoint: 


        B9              MOV CX,....

        BE....          MOV SI,....

        89F7            MOV DI,SI

        AC              LODSB

        34A5            XOR AL,A5

        AA              STOSB

        E2              LOOP ....



This is obviously decryption code.



A (third generation?) scanner type based exclusively on heuristics exists, 
performing no signature, algorithmic or other checks. Maybe this is the 
future, but the risk of a false alarm (false positive, true negative) is 
quite high at the moment. In large corporations, false alarms (false 
positives) can cost a lot of time and thus money.



We are not going to examine this scanner type except to note that it may 
lead us into a new generation, or area, of system examination and protection: 
Rule-based Examination Systems.



RULE-BASED EXAMINATION SYSTEMS



Rule-based systems are as such not a novelty. They already exist, also in the 
security field. In this field they are often characterised by applying very 
few, but very broad, rules.



What we are going to look at here are Rule-based Examination Systems seen as 
large heuristics analysers.



Looking at this sequence of opcodes:



        B8DCFE          MOV AX,FEDC

        CD21            INT 21

        3D98BA          CMP AX,BA98

        75..            JNE getint21

        E9..            JMP wherever

        getint21:

        B92135          MOV AX,3521

        CD21            INT 21



everyone in the field of computer security can see that we may have a virus 
here (or at least suspicious or badly programmed code). The problem is how 
to convert something we see in a split second into one or more specific 
and relevant behaviour characteristics, which we can feed into an examination 
system. This in turn is able to tell us whether or not we are looking at a 
virus.



With most of the rules used by the first generation of heuristic scanners, 
this was not at all difficult. Most were simple comparisons (<,>,==,!=) of 
the type: 'If a file date exceeds the current date, or is after the year 
2000, give an alert';'If the seconds field of the file time shows 62 seconds, 
we can conclude that this is pretty strange and give an alert'. This 
generation of heuristics, of course, did not have the power the analyse the 
code in the example shown above.



The second generation of heuristic scanners has more possibilities. Bearing 
those in mind, defining a rule to cover the example above is not difficult, 
but imagine a complex decryption routine preceding the actual 
(virus/Trojan/suspicious) code or-most likely- legitimate code. For example: 


        re-vector int 3

        re-vector int 1

        disable keyboard

        get int1 offset into di

        get int1 offset into si

        add counter-1 to si to point to encrypted data

        add counter-2 to di to point to encrypted data

        get word into ax

        perform some calculations with ax to decrypt word

        store word

        increase counter-1

        increase counter-2

        look if end of encrypted code has been reached

        jmp back if more code to decrypt

        enable keyboard...



In case this is just one of the instances generated by a complex mutation 
engine, it will be hard to derive a heuristic rule directly to detect a 
virus using this engine.



One of the solutions, maybe the best one, is to include a code emulator 
in the analysing system as illustrated in the figure above (missing --Ed.), 
which shows a part of a working network security system. The file to be 
checked is first given to a checksummer. If the file is already known to the 
system, a hash code is generated across the file, and this is compared to 
a stored value. If these are identical, no further action is taken, and the 
file is declared clean. If not, the file is fed to the emulator, and the 
results from the code emulation are given to an analyser as described below. 


Including a code emulator is possible, and as a matter of fact has already 
been done. It should have special knowledge of a variety of possible tricks 
used in malicious code; it should know when to stop emulating (e.g. at the 
end of a decryption routine); it should be able to realise when anti-debug 
tricks are used, etc.

Both in order to obtain portability, and to avoid obvious pitfalls, it must 
adhere to one basic and important rule:

Never actually execute an instruction, only emulate it.



In short, the task of the emulator is first to make sure that the code is 
decrypted (in case it was encrypted), and then to derive and combine 
relevant behaviour characteristics to pass on to the analyser, which analyses 
and organises these behaviour characteristics and compares the results 
of the analysis with a set of rules.



ARTIFICIAL INTELLIGENCE



From the point of view of the developer it would be nice if such a system 
were able to learn about behaviour characteristics and generate new rules 
automatically. If the system bypasses an instance of virus/Trojan/suspicious 
code because the current rules are no longer sufficient, special examination 
tools should be able to extract the necessary information from the code in 
question and create new rules enabling the system to detect this 
trojan/virus/suspicious code, and hopefully every other form derived from 
this one. In other words: Artificial Intelligence.



For security reasons, these additional tools with their special functionality 
should not be given to users. Evil-minded knowledgeable persons could use 
them to do an in-depth disassembly to research the possibilities of bypassing 
the rules generated by the system. Security through obscurity may not be 
safe, but it does help...



EMULATOR DESIGN ISSUES



When designing a code emulator for forensic purposes, a number of special 
requirements must be met.



One problem to tackle is the multiple opcodes and multiple instructions 
issue:



        87 C3           XCHG AX,BX

        93              XCHG BX,AX

        87 D8           XCHG BX,AX



The result is the same, but different opcodes are used.



        PUSH AX         PUSH AX

        PUSH BX         MOV AX,BX

        POP AX          POP BX

        POP BX



These give the same result. More than the five different code sequences 
shown above exist to exchange the contents of registers AX and BX. The 
technique of expressing the same functionality using many different 
sets of opcode sequences is used by encryptors generated by polymorphic 
engines. Some being over 200 bytes in size, they only contain the 
functionality of a cleanly coded decryptor of 25 bytes. Most of the remaining 
code is redundant, but sometimes seemling redundant code is used to initiate 
registers for further processing.



It is the job of the emulator to make sure that the rule-based analyser gets 
the correct information, i.e. that the behaviour characteristics passed 
to the analyser reflect the actual facts. No matter which seris of 
instructions/opcodes are used to perform 3D02h/21h, the analyser only has to 
know that the behaviour of that piece of code is:



        Open a file for (both reading and) writing.



On the one hand, this may not seem that difficult. Most viruses do perform 
interrupt calls, and when they do, we just have to evaluate the contents 
of the register to derive the behaviour characteristic. On the other hand, 
this is only correct if we talk about simple, straightforward viruses. For 
viruses using different techniques (hooking different interrupts, using 
call/jmp far constructions) it may be very difficult for the emulator to 
keep track of the instruction flow. In any case, the emulator must be capable 
of reducing instruction sequences to the bare functionality in a well-defined 
manner. We call the result of this reduction a behaviour characteristic, if it 
can be found in a pre-compiled list of characteristics to which we attach 
particular importance.



Another problem is that the emulator must be capable of making important 
decisions, normally based on incomplete evidence (we obviously want to 
emulate as little code as possible before reaching a conclusion regarding 
the potential maliciousness of the software in question).



Let us illustrate this with a small example:



        MOV AX, 4567

        INT 21

        CMP AX, 7654

        JNE jmp-1

        JMP jmp-2



This is an example of an 'Are you there?' call used by a virus. When tracing 
through the code, the emulator obviously doesn't know whether jmp-1 or 
jmp-2 leads to the code which installs the virus in case it is not already 
there. So, should the emulator continue with the jmp-1 flow or the jmp-2 
flow? Now, a simple execution of the code will result in just one of these 
flows being relevant, whereas a forensic emulator must be able to follow 
all possible program flows simultaneously, until either a flow leads to a 
number of relevant behaviour characteristics being detected, at which time 
the information is passed to the analyser, or a flow has been followed to 
a point where one of the stop-criteria built into the emulator is met. The 
strategy used in this part of the emulator is a determining factor when it 
comes to obtaining an acceptable scanning speed.



Hopefully, this has illustrated some of the problems associated with 
designing a forensic emulator. It is a very diffcult and complex part of this 
set-up.



Once the emulator has finished its job it passes information, a list of 
behaviour characteristics which it has found in the code, on to the 
analyser.



BEHAVIOUR RULES



Before the analyser is able to compare the behaviour characteristics found by 
the emulator to information in its behaviour database, this database needs to 
be defined. Assume that we have a COM and an EXE file infecting virus with 
the following behaviour:



        !       MODIFY FILE ATTRIBUTE REMOVING READ-ONLY FLAG 
        !       OPEN A FILE FOR (BOTH READING AND) WRITING 
        !*      WRITE DATA TO END OF FILE

        !*      MODIFY ENTRY POINT IN HEADER or WRITE TO BEGINNING OF FILE 
        -       MODIFY FILE DATE AND FILE TIME

        -       CLOSE FILE

        -       MODIFY FILE ATTRIBUTE



If we want to develop a behaviour rule for this virus, it will look like 
this:



        1.      MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_EOF + 
                MODIFY_EP_IN_HEADER



        2.      MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_EOF + 
                WRITE_DATA_TO_BOF



where rule 1 is a rule for the EXE-file, and rule 2 for the COM file. 


Since a lot of viruses and virus source codes are widely available, a number 
of different instruction sequences resulting in this functionality will 
probably show up. Normally, derived viruses contain minor changes to bypass 
a single scanner by just changing the order of two or more instructions, but 
sometimes larger code sequences can be changed without changing the 
functionality of the virus. It is trivial to change the code, so it will 
first modify the entry-point in the header of change the start-up code, and 
afterwards write the virus code. In order to detect these changes (variants) 
the next rules may be added:



        3.      MODIFY_FILE_ATTRIBUTE + OPEN_FILE + MODIFY_EP_IN_HEADER + 
                WRITE_DATA_TO_EOF @CODE LINE =



        4.      MODIFY_FILE_ATTRIBUTE + OPEN_FILE + WRITE_DATA_TO_BOF + 
                WRITE_DATA_TO_EOF



Another example (an MBR infector):



        -       PERFORM SELF CHECK

        !       HOOK INT13

        !       BECOME RESIDENT

        !       INTERCEPT READ/WRITE TO MBR

        !       READ MBR

        -*      WRITE MBR TO OTHER LOCATION

        !*      WRITE NEW MBR



Rule:

        HOOK_INT13 + INTERCEPT_READ/WRITE_TO_MBR + WRITE_NEW_MBR 


The signs in front of the descriptors in the examples above hint at the 
weighing procedure used by the analyser to attach significance to the 
behaviour characteristics supplied by the emulator A. '-' means that the 
characteristic does not have to be present, an '!' that it must be present 
(but does not in itself indicate malicious code). A '*' indicates a high 
weighing value. This '-*' means that the characteristic does not 
have to be present in the sequence of actions, but if it is, this is a highly 
important fact.



If rules 1-4 above are examined more closely, it can be concluded that they 
describe behaviour found in a number of viruses from different families. 


A single behaviour rule may detect an unlimited number of viruses. That is 
the power behind using behaviour characteristics. While at present we in 
most cases need a new signature or new (changed) algorithm to detect a new 
variant of a virus or a new virus family, the behaviour characteristics will 
continue to do their work. This is extremely important, because it removes the 
necessity for the virus researcher and the anti-virus developer to react to 
a new virus unless its technologically innovative. And those are few and 
far between.



Of course, some viruses will be developed which will not be caught by any of 
the rules in the behaviour database. These must be taken care of just like 
we do right now with any new virus; but instead of creating a signature, 
we create a new rule.



With a little luck, a new virus behaves like a virus already covered by a 
rule. If we attach a level of importance to each part of a behaviour 
characteristic, we can use this in the analyser to arrive at a conclusion. 
Depending on the level of importance of each individual component of a 
behaviour characteristic detected, the system may decide to give a message 
to the user, such as 'may be infected by an unknown virus', or 'suspicious 
code'.



The reason for attaching a level of importance to each individual part of 
a behaviour characteristic is that it makes it easier to sort out cases 
where combinations of individually innocent behaviour characteristics put 
together constitute malicious code-or vice versa. Filedate, from Norton's 
Utilities, is able to change file date and time; as a matter of fact, this 
is the purpose of the utility. The ATTRIB command is developed to change 
file attributes. Evidently, changing file attributes is in itself 
insufficient evidence of malicious behaviour. A virus needs to write to a 
file as well. So a file write is mandatory for code to be considered 
suspicious and is heavily weighted. A change of attributes is not that 
important, and thus given a lower weighting.



If the user so wishes, the file or part of the (decryptor) code on which the 
analysing system triggered can be checked by a signature scanner to see if 
a virus can be identified.



CREATING RULES AUTOMATICALLY



An important part of the system is a Rule Building Utility. Whenever a new 
virus or Trojan emerges, it may be processing by this utility, which is 
similar to the emulator, albeit with some important differences. The 
emulator only collects behaviour information without knowing anything about 
the importance of a particular type of behaviour, or if the behaviour is 
suspicious.



The Rule Building Utility has to learn the level of importance of behaviour 
characteristics, has to know which behaviour is mandatory for a virus or 
Trojan, which behaviour is used by a virus but may be omitted, etc. 
Because research and development time is very expensive, the utility must 
be able to remember this for similar behaviour characteristics, and only 
ask for additional unknown information when needed,, saving the researcher 
valuable time.



        Behaviour A:            Behaviour B:

        SEARCH FIRST FILE       SEARCH FIRST FILE

        DELETE FILE             DELETE FILE

        SEARCH NEXT FILE        CREATE FILE

                                WRITE CODE INTO FILE

                                SEARCH NEXT FILE



When rules have been defined for behaviour B and a file (behaviour A, which 
was reported being suspicious) is processed, the utility must be able to 
realise that this behaviour is not as indicative of potential maliciousness 
as behaviour A. (Typo? Should be B? --Ed.) As a matter of fact, if behaviour 
A is taken on its own, it might well be a DEL *.* command. 


At first, the utility will, ask for input frequently, because it needs to 
build up its database. However, over a period of time this type of utility 
should make life easier for the researcher.



CONCLUSION



The number of viruses is increasing rapidly: this is a known fact. The time 
will soon arrive when scanning using signatures and dedicated algorithms will 
either use too much memory or just become too slow. With storage media prices 
dropping fast, lots of systems now come equipped with very large hard disks, 
which will take more and more time, and thus money, to scan using traditional 
techniques. A properly designed rule-based analysing system feeding 
suspicious code into a scanner, which can identify the suspicious code as 
a known virus or Trojan, or perhaps dangerous code needing further 
investigation, is bound to save a lot of time.



Although it is impossible to prove that code is not malicious without 
analysing it from one end to the other, we in Computer Security Engineers 
Ltd believe it possible to reduce significantly the time used to check files 
by using all the available system knowledge instead of only small bits of it, 
as it is done today. Using virus scanning as the primary, or in many cases 
the only, anti-virus defence is an absurd waste of time and money, and 
furthermore blatantly insecure!



ABOUT CSE



Computer Security Engineers Ltd is one of the pioneers of anti-virus system 
development. The anti-virus system PC Vaccine Professional was first 
published in 1987, and since the start of 1988 a new version has been 
published each and every month. From 1988, cryptographic checksumming was 
introduced as the primary line of defence, scanning as the second. in 1992, 
the emphasis shifted, and behaviour blocking was introduced as the first 
line of defence, followed by checksumming and-in the case of an alarm from 
one of these countermeasure or to examine incoming diskettes-scanning for 
known viruses. Most recently, the basic philosophies underlying PC Vaccine 
Professional, or PCVP as the system is also known, were expanded into a 
powerful and easily-maintained network perimeter and an in-depth defence 
based on the well-known military tenets of:(1) keep them out and (2) if you 
can't keep them out, find and destroy them as fast as possible. 
==================================================================
============= 


VB_210.TXT

哪哪哪哪哪

VIRUS DETECTION-'THE BRAINY WAY'



Glenn Coates&David Leigh

Staffordshire University,School of Computing,PO Box 334,Beaconside, Stafford, 
ST18 ODG,UK

Tel +44 1782 294000 Fax +44 1782 353497



ABSTRACT



This paper explores the potential opportunities for the use of Neural 
Networks in the detection of computer viruses.



Neural computing aims to model the guiding principles used by the brain for 
problem solving, and apply them to a computer domain. It is not known how 
the brain solves problems at a high level;however, it is widely known that 
the brain uses many small highly interconnected units called 'neurons'. 


Like the brain, a neural network can be trained to solve a particular problem 
or recognise a pattern by example. The outcome is an algorithm-driven 
recogniser which does not exhibit the same behaviour as a deterministic 
algorithm. According to the way in which it has been trained, it may make 
'mistakes'. That is, it may declare a positive result for a smple which is 
actually negative, and vice-versa. The ratio of correct results to incorrect 
results can usually be improved by more and better training. 


Can such pattern recognition be harnessed to the use of virus detection? It 
could be argued that the characteristics of virus patterns, no matter how 
they are expressed, are suitable subjects for detection by Neural Networks. 


INTRODUCTION



The received wisdom is that neural computing is an interesting 'academic 
toy' of little use, apart from modelling the animal brain. If this is true, 
then it is surprising that 7 out of 10 of the UK's leading blue chip 
companies are either investigation the potential of neural computing 
technology or are actually developing neural applications[Con94]. If leading 
edge companies are prepared to spend money on this 'academic toy', then maybe 
there are advantages to be gained from its use.



Without investigation new techniques (for example heuristic scanning), one 
must accept that the rapid rise in new viruses will exert a heavy speed 
penalty from existing virus scanners. As a result of this rise in virus 
numbers and sophistication, there will be an increasing conflict between 
acceptable speed and acceptable accuracy. It is easy to become complacent 
and rely on increasing processor power to bail us out of this problem, but 
processor design is increasingly becoming a mature technology. 


What follows are the results of a feasibility study into the utilisation 
of neural networks within the field of virus detection.



WHAT IS A NEURAL NETWORK?



The working of the brain are only known at a very basic level. It contains 
approximately ten thousand million processing units called neurons, each of 
these neurons is connected to approximately ten thousand others. This 
network of neurons forms a highly complex pattern recognition tool, capable 
of conditional learning. Figure 1 illustrates a model of the biological 
neuron alongside its corresponding mathematical model.



This individual neuron is stimulated by one or more inputs. In the biological 
neuron, some inputs will tend to excite the neuron, whilst others may be 
inhibitory. That is to say, some carry more 'weight' than others. This is 
mirrored in the mathematical model via the use of a 'weighting mechanism'. 
The neuron accumulates the total value of its inputs, before passing through 
a threshold function to determine its final output. This output is then 
fired as an input to another (or a number of) neurons, and so on. In the 
biological neuron, the axon performs the threshold function. The mathematical 
model would typically used a sigmoid function or a simple binary 'yes/no' 
threshold function. The reader is referred to [Mar93] for further 
discussion.



NEURAL NETWORK DEVELOPMENT



When approaching a problem using a neural network, it is not always necessary 
to know in detail what is to be done before planning its use. In this sense, 
they are quite unlike procedurally-based computer programs, which have been 
written with a distinct goal in mind if they are to work properly. It is not 
even like a declarative program, for the same rule should apply. It is, 
perhaps, more like an expert system, where the outcome depends on the way in 
which an expert has answered a pre-defined series of questions. 


In this approach, a 'standard' three-layer neural network is constructed 
using the 'back propagation' learning algorithm. The architecture consists 
of an input layer, a hidden layer, and an output layer. Training is carried 
out by submitting a 'training set' of data to the network's input, observing 
what output is given, and adjusting the variable weights accordingly. Each 
neuron in the network processes its inputs, with the resultant values 
steadily percolating through the network until a result is given by the 
output layer. This output result is then compared to the actual result 
required for the given input, giving an error value. On the basis of this 
error value, the weights in the network are gradually adjusted, working 
backwards from the output layer. This process is repeated until the network 
has learnt the correct response for the given input [DTI95]. Figure 2 
illustrates this.



In this instance, the inputs represent the virus information, or other data 
concerning a virus-infected file. There are only two possible outputs, 
corresponding to 'possible virus found' and 'file appears to be OK'. 
The training data is divided into two classes, one containing the data for 
an infected file;and the other, uninfected files. When a suitable output 
is generated for the training data, the network is checked with a separate 
'validation set'. if the output for the validation set is not acceptable, 
it is merged with the original training set and the entire process is 
repeated. This process is described schematically in figure 3. 


The result should be a very robust fuzzy recogniser capable of coping with 
unseen data. Because neural networks can process deeply hidden patterns, 
some have provided decisions superior to those made by trained humans. 


EXISTING SYSTEMS



In 1990, a neural network was developed which acted as a 'communications 
link' between the mass of virus information available and end-user 
observations. By answering a set of standard questions regarding information 
on virus symptoms, the virus could be classified, and a set of remedies 
was given. Due to the nature of neural networks, the system could cope with 
incomplete and erroneous data provided by the end user. Even when faced with 
a new mutation, the system still gave suitable counter-measures and 
information. See [Gui91] for a full discussion.



IDENTIFICATION OF VIRUS CODE PATTERNS VIA NEURAL NETWORKS



A neural network could be constructed to learn the actual machine code 
patterns of a specific virus. However, as most viruses are mutations of 
existing viruses, a network could be made to identify a virus family. This 
carries the advantage of being capable of identifying future variants. This 
would result in a set of sub-networks linked together to provide the end 
solution.



At the lowest level this could be done at the bit level. Figure 4 illustrates 
this.



Although recognition at this level would be very difficult (if not impossible 
for a human) a neural network would be capable of it. The only limiting 
factors would be the volume and quality of the training data. The number of 
input neurons for a 1/2K virus code segment with a one-neuron output would 
be 4096. Given this, according to the 'geometric pyramid rule', the number 
of neurons in the hidden layers would be 64.



The number of virus samples for effective recognition would be in the region 
of at least 525,000. This figure should then be trebled for the number 
of non-infected files. Others would argue far more, due to the problems 
associated with false positives.



At a higher level, the input data could be represented at the byte level, 
where each byte would correspond to a single input neuron. In this context, 
the number of hidden neurons would be reduced to 22, and the number of virus 
samples would be at least 23,000. Again, the same applies for the number of 
non-infected files. This figure could be reduced further by pre-processing 
the code segment by extracting operand information, which could also increase 
accuracy and training time.



The British Technology Group, with the involvement of Oxford University, 
conducted research into such a solution. Although no formal documentation 
was procued, the results are believed to be negative.



From this, it can be seen that the use of neural networks in virus detection 
only seems practical at a high level. After all, a virus expert armed with 
a 'Virus Detection Language' and a 'Generic Decryption Engine' can provide 
a 100% accurate scanning result with advanced polymorphic viruses such as 
Pathogen in a relatively short period of time.



A NEURAL NETWORK POST-PROCESSOR



Rather than utilising a neural network to solve the virus alone, one could be 
used to process high level information, for example, that generated by 
a heuristic scanner.



Currently, most heuristic scanners use a form of emulation in order to 
determine the behaviour of a program file. Should that program appear to 
execute a suspicious activity, a 'flag' is set indicating this. However, some 
of these flags indicate more virus-like activity than others. In order to 
solve this problem, the flags are weighted via a score. Therefore, a flag 
indicating a 'suspicious memory reference' may be given more weighting than 
a flag indicating an 'inconsistent EXE header'. The total weights of the 
set flags are computed, and if a set threshold value is met, the heuristic 
scanner issues a suitable warning.



In the example of a well-known heuristic scanner, 35 of these flags are used. 
The weights are applied on an experimental basis. Initially, the weights are 
applied using a 'best-guess approach', based on the virus experts' knowledge. 
The reuslts of this are then tested on a virus collection and on a clean 
set of files. The results are analysed, and the weights adjusted accordingly. 
This cycle continues until satisfactory results are obtained. Figure 5 
illustrates this cycle.



This process will probably increase in complexity over the next few years. 
In the above example, the number of flags could literally double due to 
the increase in knowledge, new techniques employed by the virus writers, 
and further development of heuristic scanners. It is imminent that the cycle 
of adjust, re-adjust will become far more complex and time-consuming. For 
example, why should flag-x be given a weight of 8, and not 7 or 9, and flag-y 
be given a weight of 1, and not 2?



Already, one can see that the illustrated cycle is very similar in nature to 
that used in neural network training. Indeed, a neural network could be used 
in place of the weighting mechanism and bias imposed by the virus expert. 
Based on the results of other neural network applications, the results should 
be very accurate, because the neural network will 'learn' the 'optimum' 
weights. The human element is removed, and the entire learning process is 
automated.



In terms of network size, the number of input neurons would be 35, with 6 
hidden neurons, and 1 output neuron. In theory, the minimum number of 
infected samples required for training would be at least 432. However, there 
would be no detrimental effects from training the network with higher 
samples, in order to reflect current virus numbers.



CONCLUSIONS



Neural computing is no longer seen as a pure academic subject. Indeed, many 
companies are now looking towards the use of neural networks as serious 
tools. Many systems are currently in use, with very high success rates. 


It has been found that it may be feasible to use neural computing 
technology in the virus detection field. However, at a low level the results 
are unclear. There seems to be greater accuracy using deterministic 
techniques.



Using a neural network as a pre-/post-processing tool could offer a powerful 
addition to the virus expert's toolbag. Just one example is with the 
heuristic scanner. The authors believe other uses will also exist. 


ACKNOWLEDGMENTS



As Bernard of Chartres said, echoed by Sir Isaac Newton:'If [we] have seen 
further, it is by standing on the shoulders of giants'. The assistance given 
by the following people is gratefully acknowledged: Jan Hruska, Frans 
Veldman, Alan Solomon, Martin Slade, Robert Mortimer and Michael Twist. The 
continuing support of the staff at Staffordshire University and at Visionsoft 
has also been gratefully appreciated.



REFERENCES



   [Con94]      'Adopting The Neural Approach',Control Magazine,Issue 5, 
                March/April 1994.

   [DTI95]      UK Department Of Trade And Industry,Neural Computing 
                Technology Programme,1995.

   [Gui91]      Dr. Daniel Guinier,'Computer 'virus' identification by 
                neural networks',SIGSAG,1991.

   [Mar93]      Timothy Masters,'Practical Neural Networks in C++',Academic 
                Press,1993. ISBN 0-12-479040-2 Further Reading. 
                'Neural Computing-an introduction',R Beale and T Jackson, 
                IOP Publishing,1990.

                Vesseling Bontchev,Future Trends in Virus Writing,Proceedings 
                of the Fourth International Virus Bulletin Conference,1994. 
                Glenn Coates and David J. Leigh,'Virus Detection using a 
                Generalised Virus Description Language',Proceedings of the 
                Fourther International Virus Bulletin Conference,1994. 
==================================================================
============= 


VB95_212.TXT

哪哪哪哪哪哪

SCANNERS OF THE YEAR 2000: HEURISTICS



Dmitry O. Gryaznov

S&S International Plc,Alton House,Gatehouse Way,Aylesbury,Buck,HP13 3XU,UK 
Tel +44 1296 318700 Fax +44 1296 318777 Email grdo@sands.co.uk 


INTRODUCTION



At the beginning of 1994, the number of known MS-DOS viruses was estimated 
around 3,000. One year later, in January 1995, the number of viruses was 
estimated at about 6,000. By the time this paper was written (July 1995), 
the number of known viruses exceeded 7000. Several anti-virus experts expect 
this number to reach 10,000 by the end of the year 1995. This large number 
of viruses, which keeps growing fast, is known as the glut and it does cause 
problems to anti-virus software-especially to scanners.



Today, scanners are the most frequently used type of anti-virus software. The 
fast-growing number of viruses means that scanners should be updated 
frequently enough to cover new viruses. Also, as the number of viruses grows, 
so does the size of the scanner or its database, and in some implementations 
the scanning speed suffers.



It was always very tempting to find a final solution to the problem; to 
create a generic scanner which can detect new viruses automatically without 
the need to update its code and/or database. Unfortunately, as proven by 
Fred Cohen, the problem of distinguishing a virus from a non-virus program 
is algorithmically unsolvable as a general rule.



Nevertheless, some generic detection is still possible, based on analysing 
a program for features typical or not typical of viruses. The set of 
features, possibly together with a set of rules, is known as heuristics. 
Today, more and more anti-virus software developers are looking towards 
heuristical analysis as at least a partial solution to the problem. 


Working at the Virus Lab, S&S International Plc, the author is also carrying 
out a research project on heuristic analysis. The article explains what 
heuristics are. Positive and negative heuristics are introduced and some 
practical heuristics are represented. Different approaches to a heuristical 
program analysis are discussed and the problem of false alarms is explained 
and discussed. Several well-known scanner employing heuristics are compared 
(without naming the scanners) both virus detection and false alarms rate. 


1       WHY SCANNERS?



If you are following computer virus-related publications, such as the 
proceedings of anti-virus conferences, magazine reviews, anti-virus software 
manufacturers' press releases, you read and hear mainly 'scanners,scanners, 
scanners'. The average user might even get the impression that there is no 
anti-virus software other than scanners. This is not true. There are other 
methods of fighting computer viruses-but they are not so popular or as well 
known as scanners; and anti-virus packages based on non-scanner technology 
do not sell well. Sometimes people who are trying to promote non-scanner 
based anti-virus software even come to the conclusion that there must be 
some kind of an international plot of popular anti-virus scanner producers. 
Why is this? Let us briefly discuss existing types of anti-virus software. 
Those interested in more detailed dicussion and comparison of different types 
of anti-virus software can find it in [Bontchev1], for example. 


1.1     SCANNERS



So, what is a scanner? Simply put, a scanner is a program which searches 
files and disk sectors for byte sequences specific to this or that known 
virus. Those byte sequences are often called virus signatures. There are 
many different ways to implement a scanning technique; from the so-called 
'dumb' or 'grunt' scanning of the whole file, to sophisticated virus-specific 
methods of deciding which particular part of the file should be compared to 
a virus signature. Nevertheless, one thing is common to all scanners: they 
detect only known viruses. That is, viruses which were disassembled or 
analysed and from which virus signatures unique to a specific virus were 
selected. In most cases, a scanner cannot detect a brand new virus until the 
virus is passed to the scanner developer, who then extracts an appropriate 
virus signature and updates the scanner. This all takes time-and new viruses 
appear virtually every day. This means that scanners have to be updated 
frequently to provide adequate anti-virus protection. A version of a scanner 
which was very good six months ago might be no good today if you have been 
hit by just one of the several thousand new viruses which have appeared since 
that version was released.



So, are there any other ways to detect viruses? Are there any other 
anti-virus programs which do not depend so heavily on certain virus 
signatures and thus might be able to detect even new viruses? The answer is 
yes, there are: integrity checkers and behaviour blockers(monitors). These 
types of anti-virus software are almost as old as scanners, and have been 
known to specialists for ages. Why then are they not used as widely as 
scanners?



1.2     BEHAVIOUR BLOCKERS



A behaviour blocker (or a monitor) is a memory-resident (TSR) program which 
monitors system activity and looks for virus-like behaviour. In order to 
replicate, a virus needs to create a copy of itself. Msot often, viruses 
modify existing executable files to achieve this. So, in most cases, 
behaviour blockers try to intercept system requests which lead to modifying 
executable files. When such a suspicious request is intercepted, a 
behaviour blocker, typically, alerts a user and, based on the user's 
decision, can prohibit such a request from being executed. This way, a 
behaviour blocker does not depend on detailed analysis of a particular virus. 
Unlike a scanner, a behaviour blocker does not need to know what a new virus 
looks like to catch it.



Unfortunately, it is not that easy to block all the virus activity. Some 
viruses use very effective and sophisticated techniques, such as tunnelling, 
to bypass behaviour blockers. Even worse, some legitimate programs use 
virus-like methods which would trigger a behaviour blocker. For example, 
an install or setup utility is often modifying executable files. So, when 
a behaviour blocker is triggered by such a utility, it's up to the user to 
decide whether it is a virus or not-and this is often a tough choice: you 
would not assume that all users are anti-virus experts, would you? 


But even an ideal behaviour blocker (there is no such thing in our real 
world, min you!), which never triggers on a legitimate program and never 
misses a real virus, still has a major flaw. To enable a behaviour blocker 
to detect a virus, the virus must be run on a computer. Not to mention the 
fact that virutally any user would reject the very idea of running a virus 
on his/her computer, by the time a behaviour blocker catches the virus 
attempting to modify executable files, the virus could have triggered and 
destroyed some of you valuable data files, for example.



1.3     INTEGRITY CHECKERS



An integrity checker is a program which should be run periodically (say, 
once a day) to detect all the changes made to your files and disks. This 
means that, when an integrity checker is first installed on your system, 
you need to run it to create a database of all the files on your system. 
During subsequent runs, the integrity checker compares files on your system 
to the data stored in the database, and detects any changes made to the 
files. Since all viruses modify either files or system areas of disks in 
order to replicate, a good integrity checker should be able to spot such 
changes and alert the user. Unlike a behaviour blocker, it is much more 
difficult for a virus to bypass an integrity checker, provided you run your 
integrity checker in a virus clean environment-e.g. having booted your PC 
from a known virus-free system diskette.



But again, as in the case of behaviour blockers, there are many possible 
situations when the user's expertise is necessary to decide whether changes 
detected are the result of virus activity. Again, if you run an install or 
setup utility, this normally results in modifications to your files which 
can trigger an integrity checker. That is, every time you install new 
software on your system, you have to tell your integrity checker to register 
these new files in its database.



Also, there is a special type of virus, aimed specifically at integrity 
checkers-so-called slow infectors. A slow infector only infects objects which 
are about to be modified anyway; e.g. as a new file being created by a 
compiler. An integrity checker will add this new file to its database to 
watch its further changes. But in the case of a slow infector, the file 
added to the database is infected already!



Even if integrity checkers were free of the above drawbacks, there still 
would be a major flaw. That is, an integrity checks can alert you only 
after a virus has run and modified your files. As in the example given while 
discussing behaviour blockers, this might be well too late. 


1.4     THAT'S WHY SCANNERS!



So, the main drawbacks of both behaviour blockers and integrity checkers, 
which prevent them from being widely used by an average user, are: 


        1. Both behaviour blockers and integrity checkers, by their very 
           nature, can detect a virus only after you have run an infected 
           program on your computer, and the virus has started its 
           replication routine. By this time it might be too late-many viruses 
           can trigger and switch to destructive mode before they make any 
           attempts to replicate. It's somewhat like deciding to find out 
           whether these beautiful yet unknown berries are poisonous by 
           eating them and watching the results. Gosh! You would be lucky 
           to get away with just dyspepsia!



        2. Often enough, the burden to decide whether it is a virus or not 
           is transferred to the user. It's as if your doctor leaves you 
           to decide whether your dyspepsia is simply because the berries 
           were not ripe enough, or it is the first sign of deadly 
           poisoning, and you'll be dead in few hours if you don't take 
           an antidote immediately. Though choice!



On the contrary, a scanner can and should be used to detect viruses before 
an infected program has a change to be executed. That is, by scanning the 
incoming software prior to installing it on your system, a scanner tells you 
whether it is safe to proceed with the installation. Continuing our berries 
analogy, it's like having a portable automated poisonous plants detector, 
which quickly checks the berries against its database of known plants, and 
tells you whether or not its safe to eat the berries.



But what if the berries are not in the database of your portable detector? 
What if it is a brand new species? What if a software package you are 
about to install is infected with a new, very dangerous virus unknown to 
your scanner? Relying on your scanner only, you might find yourself in big 
trouble. This is where behaviour blockers and integrity checkers might 
be helpful. It's still better to detect the virus while it's trying to 
infect your system, or even after it has infected but before it destroys 
your valuable data. So, the best anti-virus strategy would include all three 
types of anti-virus software:



        - a scanner to ensure the new software is free of at least known 
          viruses before you run the software

        - a behaviour blocker to catch the virus while it is trying to 
          infect your system.

        - an integrity checker to detect infected files after the virus has 
          propagated to your system but not yet triggered. 


As you can see, the scanners are the first and the most simply implemented 
line of anti-virus defence. Moreover,most people have scanners as the 
only line of defence.



2       WHY HEURISTICS?



2.1     GLUT PROBLEM



As mentioned above, the main drawback of scanners is that they can detect 
only known computer viruses. Six or seven years ago, this was not a big deal. 
New viruses appeared rarely. Anti-virus researchers were literally hunting 
for new viruses, spending weeks and months tracking down rumours and random 
reports about a new virus to include its detection in their scanners. It was 
probably during these times that a most nasty computer virus-related myth 
was born that anti-virus people develop viruses themselves to force users 
to buy their products and profit this way. Some people believe this myth 
even today. Whenever I hear it, I can't help laughing hysterically. Nowadays 
with two to three hundred new viruses arriving montly, it would be a total 
waste of time and money for anti-virus manufacturers to develop viruses. 
Why should they bother if new viruses arrive in dozens virtually daily, 
completely free of charge? There were about 3,000 known DOS viruses at the 
beginning of 1994. A year later, in January 1995, the number of viruses was 
estimated at least 5,000. Another six months later, in July 1995, the number 
exceeded 7,000. Many anti-virus experts expect the number of known DOS 
viruses to reach the 10,000 mark by the end of 1995. With this tremendous 
and still fast-growing number of viruses to fight, traditional virus 
signature scanning software is pushed to its limits [Skulason, Bontchev2]. 
While several years ago a scanner was often developed, updated and supported 
by a single person, today a team of a dozen skilled employers is only 
barely sufficient. With the increasing number of viruses, R&D and Quality 
Control time and resource requirements grow. Even monthly scanner updates are 
often late, by one month at least! Many formerly succesful anti-virus vendors 
are giving up and leaving the anti-virus battleground and market. The 
fast-growing number of viruses heavily affects scanners themselves. They 
become bigger, and sometimes slower. Just few years ago a 360Kb floppy would 
be enough to hold half a dozen popular scanners, leaving plenty of room for 
system files to make the diskette bootable. Today, an average good signature- 
based scanner alone would occupy at least a 720Kb floppy, leaving virtually 
no room for anything else.



So, are we losing the war? I would say: not yet-but if we get stuck with just 
virus signature scanning, we will lose it sooner or later. Having realised 
this some time ago, anti-virus researchers started to look for more generic 
scanning techniques, known as heuristics.



2.1     WHAT ARE HEURISTICS?



In the anti-virus area, heuristics are a set of rules which should be applied 
to a program to decide whether the program is likely to contain a virus or 
not. From the very beginning of the history of computer viruses different 
people started looking for an ultimate generic solution to the problem. 
Really, how does an anti-virus expert know that a program is a virus? 
It usually involves some kind of reverse engineering (most often disassembly) 
and reconstructing and understanding the virus' algorithm: what it does and 
how it does it. Having analysed hundreds and hundreds of computer viruses, 
it takes just few seconds for an experienced anti-virus researcher to 
recognise a virus, even it is a new one, and never seen before. It is almost 
a subconscious, automated process. Automated? Wait a minute! If it is an 
automated process, let's make a program to do it!



Unfortunately (or rather, fortunately) the analytic capabilities of the human 
brain are far beyond those of a computer. As was proven by Fred Cohen [Cohen], 
it is impossible to construct an algorithm (e.g. a program) to distinguish 
a virus from a non-virus with 100 per cent reliability. Fortunately, this 
does not rule out a possibility of 90 or even 99 per cent reliability. The 
remaining one per cent, we hope to be able to solve using our traditional 
virus signatures scanning technique. Anyway, it's worth trying. 


2.2     SIMPLE HEURISTICS



So, how do they do it? How does an anti-virus expert recognise a virus? 
Let us consider the simplest case: a parasitic non-resident appending COM 
file infector. Something like Vienna, but even more primitive. Such a virus 
appends its code to the end of an infected program, stores (usually just 
three) first byte of the victim file in the virus body and replaces those 
bytes with a code to pass control to the virus code. When the infected 
program is executed, the virus takes control. First, it restores the 
original victim's bytes in its memory image. It then starts looking for other 
COM files. When found, the file is opened in Read_and_Write mode;then the 
virus reads the first few bytes of the file and writes itself to the end of 
the file. So, a primitive set of heuristical rules for a virus of this kind 
would be:



        1. The program immediately passes control close to the end of itself 
        2. It modifies some bytes at the beginning of its copy in memory 
        3. Then it starts looking for executable files on a disk 
        4. When found, a file is opened

        5. Some data is read from the file

        6. Some data is written to the end of file.



Each of the above rules has a corresponding sequence in binary machine code 
or assembler language. In general, if you look at such a virus under DEBUG, 
the favourite tool of anti-virus researchers, it is usually represented 
in a code similar to this:



START:                          ; Start of the infected program 
        JMP VIRUSCODE           ; Rule 1: the control is passed 
                                ; to the virus body

                                ;







VIRUS:                          ; Virus body starts here



SAVED:                          ; Saved original bytes of the victim's code 


MASK:   DB '*.COM',0            ; Search mask



VIRUSCODE:                      ; Start of the virus code

        MOV DI,OFFSET START     ; Rule 2: the virus restores 
        MOV SI,OFFSET SAVED     ; victim's code

        MOVSW                   ; in memory

        MOVSB



        MOV DX,OFFSET MASK      ; Rule 3: the virus

        MOV AH,4EH              ; looks for other

        INT 21H                 ; programs to infect



        MOV AX, 3D02H           ; Rule 4: the virus opens a file 
        INT 21H ;



        MOV DX,OFFSET SAVED     ; Rule 5: first bytes of a file 
        MOV AH,3FH              ; are read to the virus

        INT 21H                 ; body



        MOV DX,OFFSET VIRUS     ; Rule 6: the virus writes itself 
        MOV AH,40H              ; to the file

        INT 21H



When an anti-virus expert sees such code, it is immediately obvious that this 
is a virus. So, our heuristical program should be able to disassemble a binary 
machine-language code in a similar manner to DEBUG, and to analyse it, look 
for particular code patterns in a manner similar to an anti-virus expert. 
In the simplest cases, such as the one above, a set of simple wildcard 
signature string matching would do for the analysis. In this case, the 
analysis itself is simply checking whether the program in question satisfies 
rules 1 through 6;in other words, whether the program contains pieces of code 
corresponding to each of the rules.



In a more general case, there are many different ways to represent one and 
the same algorithm in machine code. Polymorphic viruses, for example, do this 
all the time. So, a heuristic scanner must use many clever methods, rather 
than simple pattern-matching techniques. Those methods may involve 
statistical code analysis, partial code interpretation, and even CPU 
emulation, especially to decrypt self-encrypted viruses: but you would be 
surprised to know how many real life viruses would be detected by the above 
six simple heuristics alone! Unfortunately, some non-virus programs would be 
'detected' too.



2.3     FALSE ALARMS PROBLEM



Strictly speaking, heuristics do not detect viruses. As behaviour blockers, 
heuristics are looking for virus-like behaviour. Moreover, unlike the 
behaviour blockers, heuristics can detect not the behaviour itself, but just 
potential ability to perform this or that action. Indeed, the fact that a 
program contains a certain piece of code does not necessarily mean that this 
piece of code is ever executed. The problem of discovering whether this or 
that code in a program ever gets ontrol is known in the theory of algorithms 
as the Halting Problem, and is in general unsolvable. This issue was the 
basis of Fred Cohen's proof of the impossibility of writing a perfect 
virus detector. For example, some scanners contain pieces of virus code as 
the signatures for which to scan. Those pieces might correspond to each and 
every one of the above rules. But they are never executed-the scanner uses 
them just as its static data. Since, in general, there is no way for 
heuristics to decide whether these code pieces are ever executed or not, this 
can (and sometimes does) cause false alarms.



A false alarm is when an anti-virus product reports a virus in a program, 
which in fact does not contain any viruses at all. Different types of false 
alarms, as well as most widespread causes of false alarms, are described 
in [Solomon] for example. A false alarm might be even more costly than an 
actual virus infection. We all keep saying to users:'The main thing to 
remember when you think you've got a virus-do not panic!' Unfortunately, 
this does not work well. The average user will panic. And the user panics 
even more if the anti-virus software is unsure itself whether it is a virus 
or not. In the case, say, where a scanner definitely detects a virus, the 
scanner is usually able to detect all infected programs, and to remove the 
virus. At this point, the panic is usually over;but if it is a false alarm, 
the scanner will not be able to remove the virus, and most likely will 
report something like:'This file seems to have a virus',naming just a single 
file as infected. This is when the user really starts to panic. 'It must be 
a new virus!'-the user thinks. 'What do I do?!' As a result, the user well 
might format his/her hard disk, causing himself a far worse disaster than a 
virus could. Formatting the hard disk is an unnecessary and un-justified 
act, by the way;even more so as there are many viruses which would survive 
this act, unlike legitimate software and data stored on the disk. 


Another problem a false alarm can (and did) cause is negative impact on a 
software manufacturing company. If an anti-virus software falsely detects 
a virus in a new software package, the users will stop buying the package 
and the software developer will suffer not only profit losses, but also 
a loss of reputation. Even if it was later made known that it was a false 
alarm, too many people would think:'There is no smoke without fire', and 
would treat the software with suspicion. This affects the anti-virus vendor 
as well. There has already been a case where an anti-virus vendor was sued 
by a software company whose anti-virus protection mistakenly reported 
a virus.



In a corporate environment, when a virus is reported by anti-virus software, 
whether it is a false alarm or not, the normal flow of operation is 
interrupted. It takes at best several hours to contact the anti-virus 
technical support and to ensure it was a false alarm before normal operation 
is resumed-and, as we all know,time is money. In the case of a big company, 
time is big money.



So, it is not at all surprising that, when asked what level of alarms is 
acceptable (10 per cent? 1 per cent? 0.1 per cent?), corporate customers 
answer:'Zero per cent! We do not want any false alarms!'



As previously explained, by its very nature heuristic analysis is more prone 
to false alarms than traditional scanning methods. Indeed, not only viruses 
but many scanners as well would satisfy the six rules we used as an example: 
a scanner does look for executable files, opens them, reads some data and even 
writes something back when removing a virus from a file. Can anything be done 
to avoid triggering a false positive on a scanner? Let's again turn to the 
experience of a human anti-virus expert. How does one know that this is a 
scanner, and not a virus? Well, this is more complicated than the above 
example of a primitive virus. Still, there are some general rules too. For 
example, if a program relies heavily on its parameters or involves an 
extensive dialogue with the user, it is highly unlikely that the program is 
a virus. Thsi leads us to the idea of negative heuristics;that is, a set of 
rules which are true for a non-virus program. Then, while analysing a 
program, our heuristics should estimate the probability of the program to be 
a virus using both positive heuristics, such as the above six rules, and 
negative heuristics, typical for non-virus programs and rarely used by real 
viruses. If a program satisfies all our six positive rules, but also expects 
some command-line parameters and uses an extensive user dialogue as well, 
we would not call it a virus.



So far so good. Looks like we found a solution to the virus glut problem, 
right? Not really! Unfortunately, not all virus writers are stupid. Some are 
also well aware of heuristic analysis, and some of their viruses are written 
in a way which avoids the most obvious positive heuristics. On the other 
hand these viruses include otherwise useless pieces of code, the only aim 
of which is to trigger the most obvious negative heuristics, so that such a 
virus does not draw the attention of a heuristical analyser. 


2.4     VIRUS DETECTION VS. FALSE ALARMS TRADE-OFF



Each heuristic scanner developer sooner or later comes to the point when it 
is necessary to make a decision:'Do I detect more viruses, or do I cause 
less false alarms?' The best way to decide would be to ask users what do they 
prefer. Unfortunately, the users' answer is:'I want it all! 100 per cent 
detection rate and no false alarms!' As mentioned above, this cannot be 
achieved. So, a virus detection versus false alarms trade-off problem must 
be decided by the developer. It is very tempting to build the heuristic 
analyser to detect almost all viruses, despite false alarms. After all, 
reviewers and evaluators who publish their tests results in magazines read 
by thousands of users world-wide,are testing just the detection rate. It is 
much more difficult to run a good false alarms test: there are gigabytes and 
gigabytes of non-virus software in the world, far more than there are 
viruses;and it is more difficult to get hold of all this software and to keep 
it for your tests. 'Not enough disk space' is only one of the problems. So, 
let's forget false alarms and negative heuristics and call a virus each and 
every program which happens to satisfy just some of our positive heuristics. 
This way we shall score top most points in the reviews. But what about the 
users? They normally run scanners not on a virus collection but on a clean 
disks. Thus, they won't notice our almost perfect detection rate, but are 
very likely to notice our not-that-perfect false alarms rate. Tough choice. 
That's why some developers have at least two modes of operation for their 
heuristical scanners. The default is the so-called 'norma' or 'low 
sensitivity' mode, when both positive and negative heuristics are used and a 
program needs to trigger enough positive heuristics to be reported as a 
virus. In this mode, a scanner is less prone to false alarms, but its 
detection rate might be far below what is claimed in its documentation or 
advertisement. The often-used (in advertising) figures of 'more than 90 
per cent' virus detection rate by heuristic analyser refer to the second 
mode of operation, which is often called 'high sensitivity' or 'paranoid' 
mode. It is really a paranoid mode: in this mode, negative heuristics are 
usually discarded, and the scanner reports as a possible virus any program 
which happens to trigger just one or two positive heuristics. In this mode, 
a scanner can indeed detect 90 per cent of viruses, but it also produces 
hundreds and hundreds of false alarms, making the 'paranoid' mode useless 
and even harmful for real-life everyday use, but still very helpful when 
it comes to a comparative virus detection test. Some scanners have a special 
command-line option to switch the paranoid mode on; some others switch to 
it automatically whenever they detect a virus in the normal low sensitivity 
mode. Althought the latter approach seems to be a smart one, it takes just 
a single false alarm out of many thousands of programs on a network file 
server to produce an avalanche of false virus reports.



2.5     HOW IT ALL WORKS IN PRACTICE: DIFFERENT SCANNERS COMPARED



Being myself an anti-virus researcher and working for a leading anti-virus 
manufacturer, I have developed a heuristic analyser of my own. And of course, 
I could not resist comparing it to other existing heuristic scanners. We 
believe the results will be interesting to other people. They underscore what 
was said about both virus detection and false alarms rates. As the products 
tested are our competitors, we decided not to publish their names in the 
test results. So, only FindVirus of Dr Solomon's AntiVirus Toolkit is called 
by its real name. All the other scanners are referred to with letters: 
Scanner_A, Scanner_B, Scanner_C and Scanner_D. The latest versions of the 
scanners available at the time of the test were used. For FindVirus, it was 
version 7.50-the first version to employ a heuristic analyser. 


Each scanner tested was run in heuristics-only mode, with normal virus 
signature scanning disabled. This was achieved by either using a special 
command-line option, where available, or using a special empty virus signature 
database in other cases.



The test consisted of two parts: virus detection rate and false alarms rate. 
For the virus detection rate S&S International Plc ONE OF EACH virus 
collection was used, containing more than 7,000 samples of about 6,500 
different known DOS viruses. For the false alarms test the shareware and 
freeware software collection of SIMTEL20 CD-ROM (fully unpacked), all 
utilities from different version of MS-DOS, IBM DOS, PC-DOS and other known 
files were used (current basic S&S fase alarms test set).



When measuring false alarms and virus detection rate, all files reported were 
counted;reported either as 'Infected' or 'Suspicious'. Separate figures for 
the two categories are given where applicable.



In both parts of the test, the products were run in two heuristic sensitivity 
modes, where applicable: normal or low sensitivity mode, and paranoid or 
high sensitivity mode. The automatic heuristic sensitivity adjustment was 
prohibited, where applicable.



The results of the tests are as follows:



Virus Detection Test



                Files           Files triggered(infected+suspicious) 
                scanned         Normal                  Paranoid 


FindVirus       7375            5902 (N/A)      80.02%  N/A 
Scanner_D       7375            5743 (0+5743)   77.87%  6182 (0+6182)   83.54% 
Scanner_C       7375            5692 (0+5692)   77.18%  N/A 
Scanner_A       7375            4250 (N/A)      57.63%  6491 (N/A)      87.74% 
Scanner_B       7392(*)         3863 (2995+868) 52.38%  6124 (2992+3112)82.68% 


(*) Scanner_B was tested couple of days later, when 17 more infected files 
    were added to the collection.



False Alarms Test



                Files           Files triggered(infected+suspicious) 
                scanned(*)      Normal            Paranoid 


FindVirus       13603           0 (N/A)   0.000%  N/A

Scanner_A       13428           11 (N/A)  0.082%  371 (N/A)    2.746% 
Scanner_B       13471           17 (0+17) 0.126%  382 (0+382)  2.836% 
Scanner_D       13840           24 (0+24) 0.173%  254 (0+254)  1.824% 
Scanner_C       13603           28 (0+28) 0.206%  N/A



3       WHY 'OF THE YEAR 2000'?



Well, first of all simply because I could not resist the temptation of 
splitting the name of the paper into three questions and using them as the 
titles of the main sections of his presentation. I thought it was funny. 
Maybe I have a weird sense of humour. Who knows...



On the other hand, the year 2000 is very attractive by itself. Most people 
consider it a distinctive milestone in all aspects of human civilisation. 
This usually happens to the years ending with double zero;still more to the 
end of a millennium, with its triple zero at the end. The anti-virus arena 
is not an exclusion. For example, during the EICAR'94 conference there were 
two panel sessions discussing 'Viruses of the year 2000' and 'Scanners of 
the year 2000' respectively. The general conclusion made by a panel of 
well-known anti-virus researcher was that, at the current pace of new virus 
creation by the year 2000, we well might face dozens (if not hundreds of 
thousands) of known DOS viruses. As I tried to explain in the second section 
of this paper (and other authors explained elsewhere [Skulason, Bontchev2], 
this might be far too much for a current standard scanners' technique, 
based on known virus signature scanning. More generic anti-virus tools, such 
as behaviour blockers and integrity checkers, whil being less vulnerable 
to the growing number of viruses and the rate at which the new viruses 
appear, can detect a virus only when it is already running on a computer or 
even only after the virus has run and infected other programs. In many cases, 
the risk of allowing a virus to run on your computer is just not affordable. 
Using a heuristic scanner, on the other hand, allows detection of most of 
new viruses with a regular scanner safe manner: before an infected program 
is copied to your system and executed. And very much like behaviour blockers 
and integrity checkers, a heuristic scanner is much more generic than a 
signature scanner, requires much rare updates, and provides an instant 
response to a new virus. Those 15-20 per cent of viruses which a heuristic 
scanner cannot detect could be dealt with using current well-developed 
signature scanning techniques. This will effective decrease the virus glut 
problem five fold, at least.



Yet another reason for choosing the year 2000 and not, say, 2005 is that I 
have strong doubts whether the current computer virus situation will 
survive the year 2000 by more than a couple of years. With new operating 
systems and environments appearing (Windows NT, Windows'95, etc.) I believe 
DOS is doomed. So are DOS viruses. So is the modern anti-virus industry. 
This does not mean viruses are not possible for new operating systems and 
platforms-they are possible in virtually any operating environment. We 
are aware of viruses for Windows,OS/2,Apple DOS and even UNIX. But to 
create viruses for these operating systems, as well as for Windows NT and 
Windows'95, it requires much more skill, knowledge, effort and time than 
for the virus-friendly DOS. Moreover, it will be much more difficult for 
a virus to replicate under these operating systems. They are far more secure 
than DOS, if it is possible to talk about DOS security at all. Thus, there 
will be far fewer virus writers and they will be capable of writing far 
fewer viruses. The viruses will not propagate fast and far enough to 
represent a major problem. Subsequently, there will be no virus glut problem. 
Regrettably, there will be a much smaller anti-virus market, and most of 
today's anti-virus experts will have to find another occupation... 


But until then, DOS lives, and anti-virus developers still have a lot of work 
to do!



REFERENCES



[Bontchev1]     Vesselin Bontchev,'Possible Virus Attacks Against Integrity 
                Programs And How To Prevent Them',Proc.2nd Int. Virus 
                Bulleting Conf.,September 1992,pp.131-141. 


[Skulason]      Fridrik Skulason,'The Virus Glut. The Impact Of The Virus 
                Flood',Proc.4th EICAR Conf.,November 1994,pp.143-147. 


[Bontchev2]     Vesselin Bontchev,'Future Trends In Virus Writing',Proc.4th 
                Int. Virus Bulletin Conf.,September 1994,pp.65-81 


[Cohen]         Fred Cohen,'Computer Viruses-Theory and Experiments',Computer 
                Security: A Global Challenge,Elsevier Science Publishers 
                B. V. (North Holland),1984,pp.143-158.



[Solomon]       Alan Solomon,'False Alarms',Virus News International, 
                February 1993,pp.50-52.

==================================================================
============= 


VB95_LT1.TXT

哪哪哪哪哪哪

"Hey, Frisk. Be easy on me, please!"

哪哪哪哪哪哪哪哪哪哪哪哪哪哪哪哪哪哪

             - an IR-member



THE EVOLUTION OF POLYMORPHIC VIRUSES



Fridrik Skulason

Frisk Software International, PO BOX 7180, 127 Reykjavik, Iceland 
Tel +354 5 617273 Fax +354 5 617274 Email frisk@complex.is 


The most interesting recent development in the area of polymorphic viruses 
is how limited their development actually is. This does not mean that there 
are no new polymorphic viruses, far from it-new ones are appearing 
constantly, but there is nothing 'new' about them-they are just variations 
on old and well-known themes.



However, looking at the evolution of polymorphic viruses alone only shows 
half of the picture-it is necessary to consider the development of 
polymorphic virus detection as well. More complex polymorphic viruses have 
driven the development of more advanced detection methods, which in turn 
have resulted in the development of new polymorphic techniques. 


Before looking at those developments that can be seen, it is perhaps proper 
to consider some basic issues regarding polymorphic viruses, starting with 
the question of why they are written.



That question is easy to answer-they are written primarily for the purpose 
of defeating one particular class of anti-virus product-the scanners. 
Considering virus scanners are the most popular type of anti-virus program, 
it is not surprising that they are the subject of attacks. 


At this point it is worth nothing that polymorphic viruses pose no special 
problems to a different class of anti-virus product, namely integrity 
checkers. This does not mean that integrity checkers should be considered 
superior to scanners-after all there is another class of viruses, the 'slow' 
viruses, which are easily detected by scanners, but which are a real problem 
for integrity checkers.



Fortunately, polymorphic slow viruses are not common at the moment. As a 
side note 'slow polymorphic' viruses also exist, and should not be confused 
with 'polymorphic slow' viruses. This category will be described at the 
end of this paper, together with some other 'nasty' tricks. 


Considering how virus scanners work, a virus author can in principle attack 
them in two different ways-either by infecting an object the scanner does 
not scan, or by making the detection of the virus so difficult that the 
scanner, or rather the producers of the scanner may not be able to cope 
with it.



Polymorphic viruses attempt to make detection difficult-either too time 
consuming to be feasible, or beyond the technical capabilities of the 
anti-virus authors.



The success of virus authors depends not only on their programming skills, 
but also on the detection techniques used. Before describing the current 
techniques, however, a brief classification of polymorphic viruses is in 
order.



Polymorphic viruses are currently divided into three groups: 


  1) Encrypted, with variable decryptors. This is the largest and currently 
     the most important group. Several methods to implement the variability 
     are discussed below, but most of them should be familiar to readers 
     of this paper.



  2) 'Block-swapping' Viruses. Only a handful of viruses currently belong 
     to this group, but they demonstrate that a polymorphic virus does not 
     have to be encrypted. These viruses are composed of multiple blocks 
     of code, theoretically as small as two instructions, that can be swapped 
     around in any order, making the use of normal search strings nearly 
     impossible.



  3) Self-modifying viruses using instruction replacement techniques. This is 
     where the virus may modify itself by replacing one or more instructions 
     in itself with one or more functionally equivalent instruction when it 
     replicates. So far this category is only a theoretical possibility, as 
     no viruses have yet been written that use this technique. It is possible 
     that some such viruses will appear in the future, perhaps only to 
     written to demonstrate that it can indeed be done.



Considering that the viruses that currently fall into the second group are 
easy to detect using ordinary search strings, and that the third group is 
non-existent, the only polymorphic viruses currently of interest are 
encrypted ones.



For that reason the term 'polymorphic viruses' should, in the rest of this 
paper, really be understood to mean only viruses of the first group, that is, 
encrypted with variable decryptors.



So, how are those viruses detected?



Basically the detection methods fall into two classes-those that detect and 
identify only the decryptor and those that look 'below' the decryptor, 
detecting the actual virus. This is not a strict 'either-or' classification- 
a scanner may analyse the decryption loop to determine that it might have 
been generated by a particular virus, before spending time decrypting the 
code.



DECRYPTION-LOOP DETECTORS



There are several different methods that have been used to detect and 
identify decryption loops-which used to be the standard way of detecting 
polymorphic viruses-but there are several significant problems with these 
methods. The most common methods are described later, but if they are only 
used as the first step, and the virus then properly decrypted some of the 
following problems disappear:



   - Virus-specific. Basically, the detection of one polymorphic virus does 
     not make it any easier to detect another.



   - More likely to cause false positives. As we get more and more 
     polymorphic viruses, capable of producing an ever-increasing variety 
     of decryptors, the chances of generating a false positive increase, as 
     some innocent code may happen to look just like a possible decryptor. 


   - Identification is difficult. Many polymorphic viruses will generate 
     similar decryptors, and it is entirely possible that a scanner will 
     mis-identify a decryptor generated by one polymorphic virus as having 
     been produced by another, unrelated virus. Also, in the case of 
     variants of the same polymorphic virus, it may be possible to determine 
     the family, but not the variant.



   - No disinfection. Virus disinfection requires the retrieval of a few 
     critical bytes from the original host file that are stored usually 
     within the encrypted part of polymorphic viruses. This means that virus- 
     specific disinfection is generally not possible, at it would require 
     decrypting the virus.



On the positive side, detection of a particular decryptor may be quite 
easy to add, althought that depends on the design of the scanner and the 
complexity of the virus. The decryption techniques are old, and several 
anti-virus producers have abandoned them, in favour of more advanced methods. 


The most common detection methods in this group are:



   - Search strings containing simple wildcards

   - Search strings containing variable-length wildcards

   - Multiple search strings

   - Instruction usage recognition

   - Statistical analysis

   - Various algorithmic detection methods



SEARCHING STRING CONTAINING SIMPLE WILDCARDS



The limitation of this method are obvious, as it can only handle a few 
'not very polymorphic' viruses, which are sometimes called 'oligomorphic'. 
They may for example make use of a simple decryption loop, with a single 
variable instruction. The last variable polymorphic virus uses two different 
instruction, NEG and NOT, which differ by only one bit. Defeating this 
detection method is easy: just insert a random number of 'junk' instructions 
at variable places in the code. 'Junk' does not have to mean 'invalid', but 
rather any instruction that can be inserted in the decryption loop without 
having an effect. Typical examples include NOP, JMP $+2, MOV AX,AX and other 
similar 'do nothing' instructions.



SEARCH STRINGS CONTAINING VARIABLE-LENGTH WILDCARDS



This method takes care of decryptors that contain those junk instructions. 
However, there are two problems with this approach. Some scanners cannot use 
this method as their design does not allow variable-length wildcards, but 
that really does not matter, as the technique is very easy to defeat: just 
make the decryptor slightly more variable so that no single search string, 
even using a variable-length wildcard will match all instances of the 
decryptor. This can be done in several ways.



   - Changing register usage: For example the DI register might be used 
     for indexing, instead of SI, or the decryption key might be stored 
     in BX instead of AX

   - Changing the order of instructions: If the order of instructions does 
     not matter, they can be freely swapped around.

   - Changing the encryption methods: Instead of using XOR, the virus author 
     could just as well use ADD or SUB.



MULTIPLE SEARCH STRINGS



This is generally considered an obsolete technique, but many anti-virus 
producers used it back an 1990 when the Whale virus appeared. This virus 
could be reliably detected with a fairly large set of simple search strings. 
Today, however, most of them would probably use a different method. This 
detection method can easily be defeated by increasing the variability of 
the decryptor past the point where the number of search string required 
becomes unreasonably large. There are other cases where the multiple search 
string technique has been used. One anti-virus company had access to the 
actual samples of a particular polymorphic virus that were to be used in a 
comparative product review. Rather than admitting that they were not able 
to detect the virus, they seem to have added a few search string to detect 
those particular samples-and they did indeed score 100% in that test, 
although later examination revealed that they only detected 5% of the virus 
in question.



INSTRUCTION USAGE RECOGNITION



This method was developed to deal with Dark Avenger's Mutation engine. It 
basically involved assuming initially that all files are infected, then 
tracing through the decryptor, one instruction at a time. If an instruction 
is found that could not have been generated by a particular virus as a part 
of the decryptor, then the virus is not infected by that virus. If one 
reaches the end of the decryptor, still assuming that the file is infected, 
it is reported as such. There are two major ways to attack this technique, 
but the more obvious is to increase the number of possible instructions 
used in the decryptor. If a virus used every possible instruction in a 
decryptor, it simply could not be detected with this method without 
modifying it. The second method is more subtle, but it involves making it 
more difficult to determine when the end of the decryption loop has been 
reached.



STATISTICAL ANALYSIS



This method is generally not used, due to the unacceptably large risk of 
false positives. It basically involves statistical analysis of the number of 
certain in the decryptor. It works best with viruses that generate large 
decryptors, that uwe few and uncommon 'do-nothing' instructions. 


Other algorithmic detection methods are possible, and are frequently used. 
Sometimes they are inly used to quickly eliminate the possibility of a 
particular file being infected with a particular virus, for example: 


        IF      The file is an EXE-structure file

        AND     The initial CS:IP value equals 0000:0000

        THEN    The file is not infected by Virus-X



In other cases the algorithm provides detection, instead of negative 
detection:



        IF      The file is a COM-structure file

        AND     It is at least 5623 bytes long

        AND     It starts with a JMP FAR to a location at least 1623 from 
                the end of the file

        AND     The first 10 instructions contain at least 5 instructions 
                from the following set {AAD,NOP,CLI,CLD,STC} 
        AND     Within the first 100 bytes from the entry point there is an 
                XOR [SI/DI/BX],AX instruction

        AND     Within the first 200 bytes from the entry point there is a 
                branch instruction that transfers control back to the XOR 
                instruction described above

        THEN    The file is infected with Virus-Y



It should be obvious from this example that the rules can get complex, 
perhaps unreasonably complex, and obviously require significant work to 
implement. Also, in some instances it is just not possible to get a 
sufficient number of rules like this to ensure accurate detection, not even 
considering the rules the virus itself may use to determine if a file has 
already been infected as the number of false positives would be too high. 


At this point it is very important to bear in mind that, while false 
positives are a very serious problem for the anti-virus author, they do not 
matter at all to the virus author. A false positive just means that the 
virus will not infect one particular file it might otherwise have infected... 
so what-after all, it has plenty of other files to infect. 


Having looked at the detectors that only detect the decryption loop, we must 
look at the more advanced detectors, which detect the actual virus, instead 
of just the encryption loop.



Compared to the decryptor-detecting methods, the following differences 
are obvious:



   - More generic. These methods require significantly more initial work, 
     but the extra effort required to add detection of a new polymorphic 
     virus is far less than with some of the other methods described 
     above.

   - Less chances of false positives. Having decrypted the virus, it should 
     be possible to reduce the chances of false positives almost down to 
     zero, as the entire virus body should be available.

   - Identification is easy. When the virus has been decrypted, identification 
     is no more difficult than in the case of non-encrypted viruses. 
   - Easy disinfection. The sample applies to disinfection-it should not be 
     any more difficult than if the virus had not been encrypted to begin 
     with.



There are two such techniques which have been used to detect polymorphic 
viruses.



        'X-ray'

        Generic decryption



The X-raying technique was probably only used in two products, both of which 
have mostly abandoned it by now. It basically involved assuming that a 
particular block of code had been encrypted with an unknown algorithm, and 
then deriving the encryption method and the encryption keys from a comparison 
between the original and encrypted code.



As this sounds somewhat complicated, an example is in order: 


        Assume that manual decryption of one virus sample reveals that a 
        particular block of code should contain the following byte sequence: 


                B8 63 25 B9 88 01 CD 21



        The corresponding encrypted block of code in a different sample looks 
        like this:



                18 C4 8B 0C 34 C2 07 F0



        Is there any way this sequence could have been obtained from the 
        first one by applying one or two primitive, reversible operations 
        like for example:



                XOR with a constant

                ADD/SUB with a constant

                ROL/ROR a fixed number of bytes



        Yes, because XORing the two sequences together generates the sequence: 


                A0 A7 AE B5 BC C3 CA D1



        Calculating the differences between the bytes in that sequence give 
        the following result:



                07 07 07 07 07 07 07 07



        which shows that the original sequence, and (presumably) the entire 
        virus body can be obtained by XORing each byte with a key, and then 
        adding the constant value of 7 to that key, before applying it to 
        the next byte.



Using this method, it may be possible to deduce the operation of the 
decryptor, without looking at it at all. There is a variant of the X-ray 
method which has been developed by Eugene Kaspersky, which works in a 
different way, but produces the same result.



The reason 'X-raying' has mostly been abandoned is that it can easily be 
defeated, for example by using an operation the X-ray procedure may not 
be able to handle, by using three or more operations on each decrypted 
byte or by using multiple layer of encryption.



The last method to be developed does not suffer from that limitation, and 
can handle decryptors of almost any complexity. It basically involves using 
the decryptor of the virus to decrypt the virus body, either by emulating 
it, or by single-stepping through it in a controlled way so the virus does 
not gain control of the execution.



Unfortunately, there are several problem with this method: 


   - Which processor should be emulated? It is perfectly possible to write 
     a virus that only works properly on one particular processor, such as a 
     Cyrix 486 SLC, but the decryptor will just generate garbage if executed 
     on any other processor. An intelligent emulator may be able to deal 
     with this, but not the 'single-stepping' method.



   - Single-stepping is dangerous-what if the virus author is able to exploit 
     some obscure loophole, which allows the virus to gain control. In this 
     case, just scanning an infected file would result in the virus 
     activating, spreading and possible causing damage, which is totally 
     unacceptable. It should be noted that a very similar situation has 
     actually happened once-however the details will not be discussed here. 


   - Emulation is slow-if the user has to wait a long time while the scanner 
     emulates harmless programs, the scanner will probably be discarded, and 
     obviously a scanner that is not used will not find any viruses. 


   - If the virus decryptor goes into an infinite loop and hangs when run, 
     the generic decryptor might do so too. This should not happen, but one 
     product has (or used to have) this problem.



   - How does the generic decryptor determine when to stop decrypting code, 
     and not waste unacceptable amount of time attempting to decrypt normal, 
     innocent programs?



   - What if the decryptor includes code intended to determine if it is being 
     emulated or run normally, such as a polymorphic timing loop, and only 
     encrypts itself if it is able to determine that it is running normally? 


   - What if the decryptor is damaged, so that the virus does not execute 
     normally? A scanner that only attempted to detect the decryptor might be 
     able to do so, but a more advanced scanner that attempts to exploit the 
     decryptor will not find anything. This is for example the case with one 
     of the SMEG viruses-it will occasionally generate corrupted samples. 
     They will not spread further, but should a scanner be expected to find 
     them or not?



Finally, it should be noted that there are other ways to make polymorphic 
viruses difficult than just attacking the various detection techniques 
as described above.



'Slow polymorphic' viruses are one such method. They are polymorphic, but 
all samples generated on the same machine will seem to have the same 
decryptor. This may mislead an anti-virus producer into attempting to detect 
the virus with a single search string, as if it was just a simple encrypted 
but not polymorphic virus.



However, virus samples generated on a different machine, or on a different 
day of the week, or even under a different phase of the moon will have 
different encryptors, revealing that the virus is indeed polymorphic. 


Another recent phenomena has been the development of more 'normal-looking' 
polymorphic code. Placing a large number of 'do-nothing' instructions in the 
decryptor may be the easiest way to make the code look random, but it also 
makes it look really suspicious to an 'intelligent' scanner, and worthy of 
detailed study. If the code looks 'normal', for example by using 
harmless-looking 'get dos-version number' function calls, it becomes more 
difficult to find.



So, where does this leave us? Currently anti-virus producers are able to keep 
up with the virus developers, but unfortunately the best methods available 
have certain problems-the one most obvious to users is that scanners are 
becoming slower. There is no indication that this will get any better, but on 
the other hand there are no signs that virus authors will be able to come 
up with new polymorphic techniques which require the development of a new 
generation of detectors.

==================================================================
============= 


VB95_LT3.TXT

哪哪哪哪哪哪

CATCHING THE VIRUS WRITERS

Jim Bates - Computer Forensics Limited - U.K.



JB> = Jim Bates

XX> = Someone ;).



INTRODUCTION



JB>



During my work in analysing virus code I have been privileged to be asked 
to help the Police on a number of occasions and some of my experiences and 
observations will be described here.



XX>



I shouldn't really see it as a priviledge to work for the police to 
bust virus writers or supply them with information they're not worthy of 
having. If some fucking computer-crime-investigation-agency by any chance 
gave me such a request because I analyzed virus code, and knew a lot about 
the virus-community I would've turn them down. Of course, that would probably 
be impossible looking at my situation today (they don't know who I am), 
but even if I'd been busted and if they'd been blackmailing mewith all kind 
of things I still wouldn't cooperate with them, and they would get no 
information whatsoever from me. It's unmoral to turn friends in, no matter 
what. Yeah, of course Jim wouldn't see viruswriters as friends, but even 
if he sees us as enemies, it's still very low.



JB>



In this presentation I shall try to highlight the problems that computer 
viruses have caused and how the authorities in the U.K. are dealing with 
them. I will describe some virus writers and the environment that they work 
in to produce their programs. I will introduce some of the reasons they have 
given for writing viruses and in some cases why they feel aggrieved at being 
"persecuted" by the authorities. Without going into too much detail that might 
help the really malicious virus writers, I will present details of some of the 
cases I have been involved in and how the Police tackled the problem of 
locating and identifying those responsible for particular virus incidents. 


XX>



I don't think Jims little hints of being *really* malicious would help us 
very much anyways. I mean creativity is everything :) when it comes to 
being a pain in the ass. If you use your creativity in (general terms) 
negative way, that's a good start :).



JB>



WHY DO WE NEED TO CATCH VIRUS WRITERS?



XX>



It's simple, we don't! Maybe Jim got a perverted appeal to catch one, 
else there's no reason. Why would they waste police resources to catch 
a harmless individual who writes computer-programs? I mean, even a damn 
dangerous writer won't cause the society that very much harm. Go bust 
some rapists, murderer or some other kind of *real* criminal! 


JB>



There are several reasons why we should want to catch these people-the main 
reason is quite simply to bring them to book for the loss and disruption 
that they cause. Another reason is that viruses are a non-productive threat 
which diverts genuine creative effort from helping to fuel to progress of 
computing in general. If we accept this then it becomes obvious that we 
should use any means at our disposal to stop people from writing them and 
distributing them. It should also be remembered that some at least of the 
virus writers display an obvious talent for programming and it is a sad 
lass to the industry that such skill is wasted.



XX>



Virus writing is for most writer's more a question about attitude 
than programming. Those people [us] want to write viruses, rather than 
useful software. If we look at it that way, there is no skill wasted.. 
And besides, technical minded people are making for example bombs, why 
wouldn't you rather stop them? It's a waste that highly skilled persons                     (Stop all
Nuclear testing!) -rb 
are making for example nuclear bombs.



JB>



There is still a public perception that viruses are just a nuisance and 
only cause minor annoyance to large companies-"... who can easily afford it." 
This is just not true-there are documented instances where ordinary people 
have suffered serious loss and even life-threatening situations as a direct 
result of virus activity.



XX>



Well, viruses aren't disciminative, they hit everyone they can. If this means 
a large company with a lot of cash or a poor XT owner, well, that doesn't 
really matter for the virus (and not for virus-programmers either.. ). Oh 
well, some viruswriter's do discriminate by the way! Manzon only hit 
80386 computers and above!



JB>



So, another vital reason for catching them is to deter others. 


XX>



If you catch one swedish viruswriter, I sure as hell will write a lot more 
destructive code and blame the one catching him for giving me motivation. 
Rebellion? Perhaps. Right? Of course.



Today you can see a few viruses dedicated to busted viruswriter's. Predator#2 
for example was dedicated to the ARCV, hehe, just too bad some IR-members fell 
victim to it :-). (* Priest rox0rs! *)



JB>



VIRUSES BREAK THE LAW



In the U.K. it is illegal to access or modify the contents of a computer 
without proper authority. It can therefore be argued that a computer virus, 
since it does not ask for permission to replicate, breaks the law simply 
by spreading.



XX>



Well, press lawsuits against the viruses then :-). However, it's not 
forbidden to create viruses in most countries and therefor he can't 
say that viruses break the law..



And besides, Stormbringer's GoodVirus did ask for permission to 
replicate! How to forbid that virus? Asshole!



JB>



If a virus is executed without the knowledge or permission of 
the computer owner and the author of that virus can be identified, he or she 
can be charged under Section 3 of the Computer Misuse Act (1990). This 
offence carries a maximum sentence of five years in prison and it is one of 
the few laws where the offence can be committed outside the U.K. 


XX>



Five years? That's fucking overdoing it! To get five years in Sweden, well, 
even if you kill five persons, it's not sure you'll get over five years 
prison! Five years in prison!#!�! Thing about it.. Five years for doing 
*nothing* wrong! The law is *really* screwed up! BIGTIME!



JB>



Thus it is quite possible for a virus writer to break the law in the U.K. 
without ever having set foot in the country. Extradition treaties will no 
doubt be updated by the time computers become obsolete.



XX>



Let's assume that Manzon infected some people in the U.K. (which it did), 
then who are to be held responsible for that? Red-A for writing 
it? An IR-member for deliberate making that Petra-remover which he infected, 
Btk for distributing it to a few swedish boards, or none of us? Well, I 
don't really care because if they'd try to press lawsuits against me 
(or anyone in IR for that matter), they would hardly succed in doing that. 
We couldn't know that the U.K. would become infected!



JB>



VIRUSES CAUSE LOSS, DISRUPTION AND DAMAGE



By far the largest loss and disruption suffered by the victims of a virus 
outbreak is that arising from downtime while installations are checked and 
disinfected. A recent outbreak of the Pathogen virus in a college in England 
affected 4 file servers and 90 work stations. The college had to be closed 
for four days while the systems were checked and cleared. The commercial 
loss of such a shutdown can be imagined. Official complaints to the Police 
concerning outbreaks of the Pathogen and Queeg viruses (produced at the 
trial of the virus author) listed estimated losses approaching 500,000 
pounds sterling.



XX>



I guess that's pretty much the same story as when a new variant of Digital 
Death which TU fixed closed the schools' computer-classes (and more) at a 
school for over one week. If that cost 500.000 pounds.. Hahahha! Blame their 
weak security, not TU!



JB>



Two other incidents will server to show just how serious virus outbreaks 
can be:



1) An old established small family bakery in south east England was hit by 
the Casino virus in January. This resulted in the partial loss of both 
their stock control and their accounts data. Two months later they were 
hit again, this time by the Michelangelo virus, with the total loss of all 
data. Backup and protection procedures implemented by a so-called expert 
after the first incident proved ineffective and very little data was 
recoverable. The company, employing nine people, went into receivership 
bankrupting the owner. He lost his business and his home. When I spoke 
to him, his wife was distraught-not knowing how she was going to care for 
her two little girls.



XX>



A very sad story, indeed. I guess they hate computer viruses now, but do 
however think they should hate the 'expert' rather than the virus writer. 
The wanna-be expert are to be held responsible for this.



JB>



2) A local medical practice in the English Midlands maintained a computer 
system containing patient records. The system operated by printing a copy 
of the patient's record in the surgery when the patient arrived in the 
reception area. While one of the Doctors was on holiday a locum undertook 
his workload and prescribed a small dose of Penicillin for a regular patient. 
The computer system was infected with the Nomenklatura virus and some of the 
records had been corrupted with the result that this particular patient's 
record was shown incorrectly and did not indicate that the patient was 
strongly allergic to the drug. The database access system did not signal any 
errors when displaying or printing records and the locum was unfamiliar with 
the patient's allergy. These two facts, coupled with the effects of the 
computer virus meant that the patient who received Penicillin suffered a 
nasty and uncomfortable reaction. It does not take a long stretch of 
imagination to the life-threatening potential of such an incident. 


XX>



Jim just listed two very sad stories about what viruses _could_ do. However, 
if a person died due to a virus, that would hardly be a deliberate attack 
to kill another person with his creation [the virus]. There is easier 
things to kill a man than with a virus. Accidents happends, you know. 
No further comments, you can be run over by a reindeer as well :-). 
Just ask Anna Jones. . .



JB>



VIRUSES ARE UNETHICAL



Quite apart from the damage and nuisance that viruses cause;and the fact that 
many countries have now criminalised them-viruses are just not ethical. It 
is thoroughly mean and nasty to write computer programs designed to 
deliberately damage data. Every virus writer that I have met has admitted to 
the cowardly and craven nature of their activities. It somehow reflects 
upon all of us that human nature can sink so low as to transfer some of its 
own baser instincts for destruction into an environment which is arguably 
the most thrilling development since the invention of the wheel. 


XX>



I don't think writing viruses is the act of a coward (We want some 
chicken tonight!). U.K. viruswriter does risk a lot of things by 
writing viruses. It doesn't take much guts to do it, but it takes 
brains to stay away from the NSY.



And besides, most viruses aren't designed to deliberately damage data! 
They just add their code into host-files and replicate without being 
noticed. A few viruses does however do deliberately damage, but that's 
quite a few, so ah well :-).



JB>



There are those who will argue passionately for the freedom to write what 
they like on their own equipment and I for one am not seeking to prevent 
them. However, with the ever wider use of computers, the skill to make them 
do our bidding brings with it a pressing responsibility.



There are arguments attempting to liken computer viruses to primitive 
life-forms. Some people actually believe this and do not respond favourably 
when it is pointed out to them that similar arguments have been advanced and 
defeated for crystalline replication or even the growth and spread of 
fire. In most cases however, these arguments are used simply as a 
retrospective excuse.



XX>



Sorta agreed.



JB>



THE FIELD OF PLAY



When the virus problem first arrived and began to grow, several far-sighted 
individuals saw that some form of defence was going to become absolutely 
essential if the well-being of computing was to survive. These people could 
be divided into two groups depending on their original motivation-some 
rushed to the defence of computing just because it was the right thing to 
do;other simply saw an opportunity to make lots of money. Sadly, as the 
problem has grown over the years, the latter group seems to have grown at the 
expense of the former. We now seem to have a symbiotic relationship between 
virus writers and anti-virus companies-each feeds on the efforts of the 
other. It is interesting to speculate that if suddenly all virus writers 
were to stop their activities, some very large companies might suffer serious 
financial setbacks. The virus writer (apparently) make no money from their 
efforts, while the anti-virus companies trade in a market worth many millions 
each year. There have even been suggestions that some viruses originate from 
the anti-virus companies themselves. I should add that to date as far as 
I am aware, none of these suggestions have been substantiated. 


XX>



I don't think AVers writes viruses themself, but it would though be quite 
cool to see viruses written by them.



JB>



So we have a situation where virus writers-who often take an anti- 
establishment stance-are actually feeding the coffers of the very people who 
they claim to despise. Add to this a few despicable individuals who-in the 
name of freedom of information-choose to collect and make publicly available 
who collections of virus code containing all the intricate technical details, 
and you have a thoroughly confused and tangled industry in which the 
opportunities for intrigue and deception are rife.



In the midst of this shambles is the computer user-caught in the crossfire 
so to speak. The typical user neither needs nor desires to know how 
computers work-it is sufficient for him that they DO work. It is 
understandable therefore that when information built by their own efforts 
is destroyed by a deliberate indiscriminate act of malice, they should be 
saddened and angered. Bearing all this in mind, the authorities are right 
to legislate against the distribution of virus code and we should all try 
to help in the fight to bring them to justice.



XX>



Well, I don't expect every computer-user worldwide to know how 
to protect themself against viruses -- that is quite impossible. 
Not even me, can gurantee my computer to be total clean from 
viruses.



However, let's assume I got my HD totally fucked up by a virus of a 
writer who lives no more than 200 meters from me.. Well, I wouldn't 
like to see him in jail anyways! Why would I bring him to justice? 
He didn't break the law by writing it, and what if he didn't distribute 
it himself? Hey presto! He cannot be held responsible for an accident. 
Shit happends, just too bad it happend too me :).



JB>



WHO ARE THE VIRUS WRITERS?



XX>



We are, of course :-).



JB>



My own observations, confirmed by other people similarly engaged in trying 
to track them down, are that virus writers are generally 'loners'. 


XX>



ARCV was hardely loners, they were a group of technical-interested 
youth.



JB>



Most right thinking people reject the idea of indiscriminately damaging 
something, especially when it does not belong to you. For this reason a 
virus writer cannot discuss his hobby with friends or acquaintances. True 
he may be able to establish contact with like-minded people across computer 
communication links but even there he must maintain a protective anonymity 
and so he tends to be a loner.



XX>



Ah well, when I meet my IRL friends we won't discuss viruses, but some of 
them know about my interest in viruses and some even write viruses. but it 
can happen that we discuss computers, but mainly we spend our time 
together talking about parties and babes.. Confusing chats I tell you! :). 


JB>



Working alone on something as technically challenging as computer virus code 
takes an enormous amount of time and concentration. The dedicated virus 
writer may therefore become obsessive and shun any other activities which 
may take him away from his obsession. This lack of social interaction makes 
them withdrawn and uncommunicative as well as leading to a general 
tendency to social inadequacy.



XX>



Hahahah! Well, I have met quite a few viruswriter's, and I would say 
that we are no more 'unnormal' than people you meet when you're out 
getting drunk.. I live a perfectly healthy social life, and most of my 
fellow writers do as well.



The only thing that makes us *really* different to the average 
blonde-bitch is our IQ, our dicks, our knowledge in computers and 
low-level assembly language :-). (Well, not total true, but compared 
to most people, we ain't much different.. ).



Jim Bates..  Come and visit me and go get some facts before writing up 
this typical bogus-shit. AV-people spend *way-more* time in front of their 
computer than most viruswriters does, and hey..  I've already written a lot 
of things already about this very topic, so just read that and thrust me on 
this one: You're wrong! I'm right!



(hehe XX can't stop thinking about his dirty fantasies :P) -rb 


JB>



The reasons they become attracted to virus writing in the first place seems 
to vary widely but are usually preceded by a technical curiosity and in 
some cases a fascination with the spurious argument that viruses may mimic 
a simple life-form. As they become more involved, they tend to lose touch 
with the real world and live almost entirely within the electronic 
environment that they work in a cyberspace that may seem more real to them 
than the normal humdrum of human existence.



XX>



Well, it's easy to get too involved in the VX-underground because it's 
a very fun place to kill some of your time (Boy, you should not 
kill time, pressure it, Sieze the day! Uerm, well, could you give me 
a dollar so I can get loaded??) in.



However, once you spend time in front of IRC rather than seeing 
IRL-friends, getting drunk, or just fucking you girlfriend(s), think 
twice what you're doing :-)... Just combine it, and don't take the scene 
too damn serious :-) (words of wizdom, could you please email the money). 


JB>



The inclusion of damaging payloads and intricate trigger mechanisms 
become just another technical exercise where they can demonstrate their 
superiority over the rest of us.



XX>



Not agreed :-). Payloads are trivial to write.. It's no technical challenge 
in writing a HD-trasher.. Every programmer worth his salt can make one :-). 


JB>



Depending upon the degree of their social degeneration, their 
activies can be described as ranging from irresponsible and stupid to 
malicious and hateful. I have not yet come across a virus writer who simply 
wrote and did not distribute his programs. Distribution in the sense meaning 
loosing them into the computing community in a way that would ensure their 
survival and growth.



XX>



Well, I've written viruses totally motivated by hate (Xxxxx.XXXX), and 
well that was just another way to express myself..  If I hadn't 
written it, I sure as hell would have smashed a lot of things just to 
calm down. Is that better? Of course not! But that's exactly what other 
people are doing.. Smashing things, beating up to them totally unknown 
persons, raping girls, etc. Heck, it's a violent sociely we're living 
in, and viruswriting just isn't that bad..



JB>



The intensive and technical nature of virus writing is such that they will 
usually work from home. A small proportion being their activities within 
open computing environments at school or college but the large majority 
either start at or graduate to, a semi-secret existence at home. When they 
are asked to give their own reasons for writing viruses they often have 
difficulty.



XX>



Let's talk about a person we all know :). All he wanted was to 
release an e-zine, and that it turned out to be a VX-zine wasn't 
the important thing. It just turned out that way becuase another 
computer guy he had got in contact with had a keen interest in 
viruses. Well, that's a reason. Curiousity another.



JB>



Remember that most of those that I have met have been under 
investigation precisely because their programs have caused damage at large. 
In these cases they are aware that their activites have had serious effects 
and there are probably serious consequences for them to face. They may 
attempt to justify themselves by suggestion that they were "only testing" 
or "researching viruses". One incredible argument went as follows:- 


The virus writer was asked why he wrote and distributed viruses-he replied 
that he "wanted to be a virus researcher-but no one would give me any so I 
thought I'd write my own." His interrogator observed, "it's a good job 
you weren't interested in brain surgery."



XX>



Well, then who's fault is this? The AV-people I tell you! For example, 
take the example why TridenT was being formed? (For those who doesn't know 
shits about TridenT, read the interview's in Insane Reality #4, or in Mark 
Ludwig's book -The Virus Creation Labs-). Wasn't it because John Tardy was 
denied source-codes from (among many) Frans Veldman - author of TBAV? 


JB>



The latest reasoning, from the Black Baron, was that he used the viruses as 
a platform to test and advertise his polymorphic engine. He argued that the 
SMEG polymorphic code had potentially useful capabilities in copy protection. 
This highlights another of the most common arguments-that virus code might 
somehow be beneficial. In nine years of work in this field I have yet to 
see a demonstrably beneficial use for virus code but still they think that 
can be a justification for indiscriminate destruction.



XX>



Well, hard-ass encryption-utilities can be used in useful ways. If you 
don't like other people to look at or steal your code, encrypt it so the 
people who are too lazy to write their own code cannot decrypt it. 
So, imo The black Baron was correct, his program just wasn't 
any hard to break (one encryption manually, not making a program that 
breaks all encryptions automatically).



JB>



Whether you classify them as thoughtless or malicious, these people are 
criminals and must be caught and punished. The damage they are doing 
is incalculable.



XX>



Hey, I'm a unpunished writer of viruses, making me into a non-criminal! 
I never broke any swedish-law, so come kiss my arse!



JB>



SO HOW DO WE FIND THEM?



Let us look at a possible sequence of events which might lead to catching a 
virus writer.



XX>



It ain't hard :). ("Q: How can you tell there is an old man in the dark?) 
Write a few groundbreaking viruses, go on #virus show 'em up and ask for 
membership in Phalcon/Skism or VLAD. Scene-secrets are traded quite open 
between groups and soon or later, you've figured out enough to caught as 
good as every group-affiliated (and some independent) writers. But what's 
stopping you is to write those damn viruses..  Asshole! If you want 
something done, put your crappy ethics aside, and do something for 
once.. Words won't help you know! :).



Whoops! Just gave him *the hint* of how to catch us.. So, beware 
dudes! Hehe.



JB>



First someone suffers loss or damage of a sufficient magnitude to persuade 
them to make an official complaint. The complaint will need to contain 
details of the virus-including a sample-and details and costs of the damage 
or disruption suffered.



This can be a problem since the first objectie of the victim is to destroy 
the virus and resume normal operations as soon as possible. In the Black 
Baron case I analysed in excess of 57 virus samples from various complainants. 
The most from any site was 9 samples but the majority sent only one or two. 
The presence of a "generation number" within the virus code made it possible 
to identify which samples had actually introduced the virus in some instances 
but a greater number of samples would have made the job of tracing the 
infection so much easier. It is this tracing process which is so difficult. 
Victims are rarely able to pinpoint exactly where the infection came from- 
particularly is the virus is designed to infect slowly and quietly. Once an 
origin has been established, enquiries can be made to try to trace further 
back along the chain. A major problem for virus writers is how to get their 
code into the computing community. Some may upload infected files to 
Bulletin Boards or Internet sites. Others may try physically passing infected 
programs around. Whichever method is chosen it is this initial distribution 
which is the most dangerous area for the virus writer. In the Black Baron 
case the Police followed up complaints received and were able to determine 
that initially infected software had been downloaded from various Bulletin 
Boards. With help from the BBS operators and the telephone company, activity 
logs and telephone billing records revealed the source of the original 
uploads. There were other considerations which need not be discussed here but 
the net result was that the Police knew exactly when and where the infected 
programs had come from.



XX>



That was quite ignorant from TBB. Ignorance ain't no good when being 
in this kinda business. Try tracing the origin of Petra-Rm.Zip.. 
(Silence.. ).



JB>



A number of complainants asked for complete confidentiality and this was 
respected because there were plenty of others willing to stand up in court. 
However, users must accept that if they want justics they must be prepared to 
make their complaints public.



Once the originator of the infections has been detected and identified, the 
Police enquiries can focus in earnest and will eventually result in a search 
and seizure operation. A search warrant is issued and the suspect's home 
(and possible his workplace) will be visited and all computing equipment 
seized for investigation.



The suspect is interviewed and questioned about the alleged offences and his 
equipment is examined in minute detail for any evidence linking him to 
viruses.



In one case, the suspect became aware of Police interest and took measures 
to thoroughly disinfect his computers and his his virus source code. He also 
sent his machine for safe keeping to a friend, thinking that this left him 
safe. When the Police did arrive, no computing equipment of any kind was 
found and he denied any involvement in computing. Sine the Police were well 
aware of his recent activities-and could prove them-this denial simply 
confirmed that he had something to hide. The Police knew about the friend 
and had visited him too! Several computers were seized and the hidden source 
code was rapidly found and identified. Even without the source code charges 
would still have been brought, but finding it made the case much stronger. 


XX>



Hint: Encrypt your HD and put all your floppy-discs in the microwave-oven 
if you are aware of the police are watching you! :).



JB>



HOW YOU CAN HELP



XX>



Hey! Send your name in to the NSY :-). .. . That's not too hard! 


JB>



For the best chance of conviction, a series of events must be shown to link 
the virus writer with the damage. You as a potential victim should bear this 
in mind so that if you do get hit you can provide a solid start for the 
Police to investigate the chain of events. What new program or disk brought 
the infection in? Where did it come from-and when? What damage/disruption did 
it cause and how much did it cost you as a result?



In the U.K. arrangements can be made to take detailed image copies of 
infected machines so that as much evidence as possible is preserved with the 
minimum of disruption to you. You can then proceed with the disinfection 
process and a rapid resumption of normal working.



If you operate a Bulletin Board or Internet site, do you log incoming 
software and callers? Could you provide the evidence that the Police need? 


The quicker you can provide this information, especially in the case of 
hitherto unknown viruses, the better the chance of catching the perpetrator. 


If you are a virus writer-remember that you are not completely anonymous. 
There are big risks and when you are caught there are heavy consequences. As 
interesting as you find virus code-are you prepared to go to prison for it? 


XX>



I know that I'm not completely anonymous. But since I am doing nothing 
wrong, (in the writing itself), I won't care. If Jim Bates wanna try 
something about (C)opyright-violations, he just gotta figure out who 
wrote this text :) and decided to include it..  Bah. Good luck asshole, 
you'll need it..



- (C) 1995 Someone, Anyone, Everyone & Noone INC.