About AV-Checker

Hello!

The topic of AV-checkers has been raised on numerous occasions: there are concepts, raw and ready realizations, thoughts and other bullshit. That's why I decided to dump here everything related to the checker's working scheme. I did however add something new and left the unnecessary parts out.

An Av-checker is an online-service, checking files/data for viruses/trojans/worms/etc with the help of (prepared in advance) different Anti-Virus (AV) scanners. For starters, we will need a powerful multi-core dedicated server (the more cores, frequency, cache - the better), with a big RAM and supporting hardware virtualization (for hypervisor). In addition, we will need wide network channels and unlimited traffic (specific technical characteristics are not provided because it all depends on what you want and can get). A "simple" PC with installed Virtual Machine (VM) could do, but it would directly influence the working speed of AV-checker. After all, the good performance depends directly on the equipment's capacity and its configuration.

As for the programming languages, you can write in any of them and in the way you prefer. For example, C++ (for the system's engine) and php + html (for web-design).

Later, you can add the following (popular) features aka verifications for your online-scanner:

files - statics (checking with local av-scanners, checking traffic "on fly", real-time checking while copying/downloading files; signature analyzer + heuristic + code-emulator will be used here);
packs (exploit pack / webpage / etc);
url/domains;
files - dynamics (checking files for performance, behavioral analysis and so on).

Let's take a closer look at each one of those points.

Checking files - statics

A checker is a web admin panel + system engine "communicating" using a database (DB) and located on a badass hardware, so to speak.

Implementation technology:

we create the required amount of VMs (with guest OS) on a dedicated server;
one of the VMs is being transformed into a server - it will be used for web-interface and DB (files will be uploaded using an admin-panel and stored for anti-virus verification later);
we install one AV and our "handler" software on every VM. The software:
- [1] takes a file from DB that must be checked with a particular AV. The DB interrogation is carried out continuously over a given time interval. However, here is where another weak link is detected. With significant loads, the database will start overloading. Solution to this problem is to keep the files in a web-directory and to keep the downloading links in a DB;
- [2] launches AV (console AV with required settings) to start scanning the file. Here is another fucked-up bombshell - slow checking speed. It all has to do with the fact scanners (of certain) AVs are loading too slow (engine initialization, module/virus DB loading and other stuff (besides the actual checking time)).
  So, as an option, you can use the GUI-version that loads the roots into memory only once. Moreover, we decrease the total scanning time by downloading the files in packages and accessing the scanner only once.
  In addition, we can take advantage of checking traffic on fly and real-time scanning, configuring its implementation when data is being downloaded and/or copied. However, by default, everything must be thoroughly checked and configured, because in most AVs this stuff is known for poor performance, especially during the overloads;
- [3] receives the result from the AV. You can catch it by transferring the console scanner output to pipe/file from the 'personal' log of the AV-engine. If we're manipulating a GUI, then through its form by sending messages and pressing buttons. Other options include catching it from the event log, by decrypting hidden files (not a good method), from DB (for example, there are SQLite fans) etc;
- [4] sends the result to the DB. It's all simple here - we parse the report received; if the file is empty - we send an "OK", if not - we send the name of the detection;
- [5] makes an update of the AV and its virus DB (with a preset interval of time).
  We also have to make sure the files being checked are not going to AVs (setting up the local network):
  firstly, we switch off the internet connection on all VMs with AVs. Afterwards, we take one VM prepared in advance, transform it into router and configure it. The next step is to download antivirus updates and save them in shared folders. Finally, we disable all the services of anonymous data sending and specify required paths to shared folders as a mirror. For those not supporting this parameter, we do everything ourselves: download virus DBs with wget/curl and put them into directories.
  In most difficult cases, we use proxy to control the outgoing traffic.

Points [2], [3] and [5] take most of the work, because every AV requires its own algorithm. That's why we can factor out all these point into a single module (handler_n), with a uni-interface (task-manager), while realization will be different for every type of AV.

Here is how the checker works schematically:

                                 ******************************<-***********************************
                                 *                                                                 *
                                 *                           SERVER                                *
                                 *                                                                 *
                                 *                                                                 *
                                 *                                                                 *
                                 *      ************                                               *
                                 *      *          *                                               *
          *         5. profit    *      *          *             4. result                         *
         * *<---------------------------* web-site *<----------------------------------+           *
        *   *                    *      *          *                                   |           *
       *     *      1. file      *      *          *                                   |           *
      *  user *------------------------>*          *                                   |           *
     *         *                 *      ************                                   |           *
    *************                *        |                                            |           *
                                 *        |  2. file\task                              |           *
                                 *        +---------------->********                **********     *
                                 *                          *      *  3. file\task  *        *     *
                                 *                          *  DB  *--------------->* engine *     *
                                 *                          *      *                *        *     *
                                 *                          ********                **********     *
                                 *                                                                 *
                                 *                                                                 *
                                 *                                                                 *
                                 ******************************->***********************************





                                 ******************************<-***********************************
                                 *                                                                 *
                                 *                           ENGINE                                *
                                 *                                                                 *
                                 *                                                                 *
                                 *                                                                 *
                                 *                           *************              ********   *
                                 *                           *           *   <-report   *      *   *
                                 *                   +------>* handler_1 *<------------>* AV_1 *   *
                                 *                   |       *           *     file->   *      *   *
                                 *   ***********     |       *************              ********   *
                                 *   *         *     |                                             *
             3.2 result          *   *         *     |                                             *
    <--------------------------------*         *     |       *************              ********   *
                                 *   *  task   *  <-report   *           *   <-report   *      *   *
             3.1 file/task       *   * manager *<----------->* handler_2 *<------------>* AV_2 *   *
    -------------------------------->*         *    file->   *           *     file->   *      *   *
                                 *   *         *     |       *************              ********   *
                                 *   *         *     |                                             *
                                 *   *         *     |                                             *
                                 *   ***********     |       *************              ********   *
                                 *                   |       *           *   <-report   *      *   *
                                 *                   +------>* handler_n *<------------>* AV_n *   *
                                 *                           *           *     file->   *      *   *
                                 *                           *************              ********   *
                                 *                                                                 *
                                 *                                                                 *
                                 *                                                                 *
                                 ******************************->***********************************

It would be even better if we could add:

return of results in real time;
archive handling (file unpacking, every file is then put into a web folder) / file handling (the type is determined);
performance increase (before the file is checked, the DB is searched for its hash in DB - found/not found - a decision is made);
downloading and checking of large-sized files;
general decrease of load on server/admin panel/DB/engine;
(in case there are several servers) parallel work of the handlers;
for additional services (like domain check etc.) - making one more VM with a required handler for a specific task;
etc :p.

Checking packs

In a nutshell, a pack is an exploit pack returned by a rotator (+ there is an admin panel with statistics and lots of other stuff). A rotator is a script that determines a variety of the machine's characteristics (OS, browser and its version, etc.) and returns a suitable script. A pack (bond) is used for testing software for penetrability, likelihood of vulnerable machines being infected with subsequent expansion of its own software etc.

For AV-checker's engine, checking bonds equals to checking files, the differences will be visible only in the admin panel. This is how it works:

a full web-address is given upon entrance; it's used for downloading (the same) page content by several user-agents (the more of them - the better, as that way the bonds will return different exps);
data received is saved in simple files (for example, *.html);
checking files with AV.

You can also think in advance about the kinds of protocols the bond checking will work with, as well as what can be done if an IP block after every entrance is enabled for the pack (however, that's not the checker's problem anymore). That's all there is to it.

Checking Domains

DNSBL (DNS BlockList/BlackList - previously RBL - RealTime Blackhole List) is a blacklist of IP/domains that send spam and are stored using the system with a DNS architecture. There are also tons of different DNSBL-servers offering their services (lists) to deal with useless information. It means that everything is already built, all we need to do is add the required services to our checker and automate the checks (yeah, right, just fucking add them, it's that simple =)). The truth is, the task itself is kinda easy, and in my opinion, it's even easier to realize it using scripts on just one VM.

System frame is going to be almost the same as that of the file checker (with task manager, handlers and etc), but with changes, namely (in order of the previously mentioned points):

[1] instead of the file, the IP/domain is checked (it's crystal clear anyway);
[2] ip/domain checking is executed DNSBL-services selected in admin panel. Besides, we have 3 types of checks:
- [a] checking web databases through parsing of the downloaded result page. For example, for google, it's done in the following way:
  - [+] create full web-address (google safebrowsing);
  - [+] download content by that url;
  - [+] on the page received look for specific text (if we find a "NO", the domain is clear);
- [b] downloading (in simplest cases - a text version) a database of 'dirty' ips/domains with subsequent search for required IP (spyeyetracker blocklist etc);
- [c] resolving IP -> writing IP in DNS PTR notation -> including the DNSBL server name into the tail -> receiving (not receiving) the answer. I should also add that there are 2 types of DNSBL: LHSBL (Left Hand Side Blacklist) and RHSBL (Right Hand Side BlackList). The main difference is that LHSBL is used for checking IPs and RHSBL for checking domains.
  Example for LHSBL:
  We have an address: eof-project.net. We get its IP, let's say it's 12.34.56.78. Then, we write the digits it reversed order: 78.56.34.12. Then we add the name of any list host: 78.56.34.12.cbl.abuseat.org. In C, the checking can be done using the gethostbyname(char *name)) function. If you receive an answer (for this host, it's IP 127.0.0.2), then the address being checked is locked (is on the list).
  You can receive any IP address, what matters is the fact of its presence or absence in the query answer.
  The example for RHSBL is similar, but there will be a domain instead of an IP (eof-project.host_name);
- [x] checks of other types realized in compliance with the task assigned (for example, for Internet Explorer, we can create an additional VM in which we can emulate a user entering the website etc).

Checking files - dynamics

This category includes:

checking files for performance (the most important thing is to check files on many different OSs. In the end, it can either work fine or there is a bug);
"utility" behavior analyzer (monitoring of changes in the file system (FS ) and registry, network activity, logging calls of winapi- functions, traffic and keystroke analysis, driver installation and other features. The result is a detailed report of all program actions, based on which the user can delver a verdict whether the file is clean or dirty);
"AV" behavior analyzer (in other words - checking files using AV proactives. AV result (not user's result) - "OK" / "name of the detected file");
etc etc.

Just like in other cases, there are several ways to create this kind of check (the points listed above can be used separately, combined into one big point or mixed, whatever rocks your socks =)):

[+] the previously described work scheme (file checking - statics) also fits here perfectly. If we build, say, an "AV" behavior analyzer, then clearly, it's the proactives that will be working instead of AV scanners. They work according to a similar scheme. The Task Manager deserves special attention, we will install it on a separate VM. The main functions of the task manager include accepting/processing tasks, returning files to handlers, receiving a result from them and sending it back to the database, doing a rollback of snapshots after each check with proactives, updating OSs/AVs (and other software) with subsequent creation of new snapshots etc.
Besides, the TM monitors the service engine operation. If something goes wrong, it takes appropriate actions (for example, if a process is bugging (which happens quite often), or some tricky soft is being checked and is BSOD'ing the OS etc).
[+] everything else:
- a different frame: an option without a TM, alternate checking, the "Russian doll" principle (other VMs launched within a VMs) and etc;
- VM alternative: physical machine (for example, full system back-up used in a sector mode with rollback possibility), sandboxes (with special analyzers) and other stuff.

0x

You get the picture now. In general, everything is pretty straightforward. It's just too much donkey work, both obvious and no so much, some of which will come up in the process. But it's nothing to beat your brains about. The most important thing is to find a powerful server :P. If you do need it - go for it and you will make it work no doubt.