SCANNING TWO TIMES FASTER - Warhol? Warhead? - Butthead. "The Apocalypse is Postponed" kaspersky news Lets imagine some p2p program. It exists both as an executable code plus some (quickly changing) number of instances, or machines were it is executed at. There also exists some big set of ip addresses, to be scanned by these instances. There is interesting paradox: program which scans all the addresses is certainly better than program which scans none. However, program which scans 10 addresses instead of 100, with the same result, is much more better. This could happen when 90 addreses remaining just does not exists (i.e. not assigned) or are known (pre-calculation) to be neutral to these scans. Just imagine that there are 10 girls, but two of them are infected badly, and one of them is masked kaspersky. In such a situation, making correct choice is critically important. ;-) I never saw, but there can be or in the future may appear ids/firewall systems detecting network scans by multiple not-assigned addresses access. In other words, sometimes not to scan is harder than to scan. If scanning algorithm is distributed and smart enough to not scan each address more than required number of times, then minimal scan effort is equal to sending a single packet of 28 bytes size to a single ip address; we do not consider reply packets, and probable following data exchange. So, lets assume that to scan a single ipv4 address, our program sends 28 bytes. Overall traffic means summary traffic generated by all the instances during all the time. We all know that flood is very cool, but, well, sometimes its better to save some packets for some specific addresses - government, anitiviral, or even your friend sites, there are always some you're in love with. If each scanning instance doesnt rely on other instances and so it scans randomly choosen addresses (the most stupid algorithm), then amount of these outgoing bytes are multiplied by the number of instances and by the number of redundant scans. From other side, to avoid sending one packet we only need 1/255 = 0.004 bits of information, or even less. This is just because some (existing or absent) subnet in form of a.b.c.* maps to a single bit (in case of a bitset) or some big ip range maps to a few bytes. However, these 0.004 bits should be transferred once to the each instance - and yep, it uses incoming traffic. ;) So, lets compare: (255 * 28 * n_of_instances * algorithm_stupidity) = 7140 * x | | | | | | | ranges from 1 to infinity | | | | | equal to 1 if algorithm is correctly distributed | | | smallest packet size, but really it should be mean traffic value | during data exchange between | machines A (scanner) and B (being scanned/fucked/whatever) | number of ip addresses in a small subnet i.e. compare 7140*8 = 57k bits of outgoing traffic with 1 bit of incoming traffic. Also note, that 2**32 address space is very big, but only half of addresses are in use, and faster finding of correct ip address results in exponential performance. The best ways to keep information about address space is 1) to use bitset, where each bit means some subnet, or 2) to use list of ip ranges There are also some simple, but not so effective solutions - to use bitsets for a.*.*.* subnets (32 bytes of data), or a.b.*.* (8k of data), or just dont use some specific ranges, such as 127.*.*.* or 224..255.*.*.* and so on. (ranges 10.*.*.*, 172.16..32.*.* and 192.168.*.* requires special processing, because it specifies local networks, and should not be shared between instances) In case when you're planning to write some cool p2p network which is aimed to work in the specific country (or countries), you want your p2p client to know all the IP addresses which belongs to these countries. Lets now consider data representation. First, there are four RIRs, and they publish ip<-->country mappings. Note, that such information is incomplete, and you should refer to GeoIP database or use something else if you want more detailed ip information. This is how text entries looks like: 4 files: delegated--: file entries: ripencc|SE|ipv4|193.45.0.0|65536|19930901|allocated ripencc|AT|ipv4|193.46.0.0|65536|19930901|allocated ripencc|CH|ipv4|193.47.0.0|65536|19930901|allocated So, we now know all the IP ranges to country mappings. However, these ranges are sometimes duplicated, sometimes intersected (yep, obscurance is still there), sometimes they should be merged, and so on; once i wrote a tool (see convert.cpp) to check/fix it all, and it still seems to be useful. After all the text conversions, we get the following: c12bf000,c12bffff,193.43.240.0,193.43.255.255,EU c12c0000,c12dffff,193.44.0.0,193.45.255.255,SE c12e0000,c12effff,193.46.0.0,193.46.255.255,AT c12f0000,c12fffff,193.47.0.0,193.47.255.255,CH c1300000,c134ffff,193.48.0.0,193.52.255.255,FR Then, this data is converted into the binary format. When binary data is generated, it is reordered to remove some redundancy, before compression. At first step, we have min/max ip address for the each entry, entries are sorted by min address in ascending order. We convert each entry into address base and address count, and then substract next base from the previous one, to form address delta. address delta = d1.d2.d3.xx address count = c1.c2.c3.yy Then we skip lowest bytes (xx and yy) from both base and count, as such ignoring (or expanding) all the subnets having less than 256 addresses, and now we have 6 bytes for the each entry: d1 d2 d3 c1 c2 c3 } d1 d2 d3 c1 c2 c3 } N entries .. .. .. .. .. .. } Then, we build matrix of sizes 12 x N/2, by means of grouping high and low nibbles (4-bit): d d d d d 1 1 1 1 1 d d d d d 2 2 2 2 2 .... c c c c c 3 3 3 3 3 This should be done because each nibble/byte value distribution highly depends on its position, i.e. values with higher bit positions are almost all equal to zero, while values with low positions are changed more randomly. So, re-grouping nibbles will result it placing all zeroes near each other, and compression will result in much better ratio. Then we use special modification of upx-for-code-snippets tool to compress data and add tiny sfx decryptor. Here are some stats: country: FR total ip addresses: 61147904 text entries: 1439 plain binary file size: 8640 bytes packed file: gzip: 3016 bytes upxs: 3261 bytes (NB:size includes sfx of about 100 bytes) As such, to know all the addresses of some country in run-time, you only need to use some kilobytes of data in your program. For all assigned ip addresses: total ip addresses: 1973368064 text entries: 3099 plain binary file size: 18600 gzip: 7903 bytes For all unassigned ip addresses: total ip addresses: 2321720320 text entries: 3181 plain binary file size: 19092 bytes gzip: 6300 bytes upxs: 6519 bytes Sending 28 bytes to all these unused addresses, ip by ip, will require 2.3G * 28 = 65 GB of traffic. There are also 1.9G assigned addresses remaining to scan, which is at least 1.9G * 28 = 55 GB of traffic. So, real scan speed can be multiplied by about 2 times. When ip ranges are compressed and tiny unpacker is added, resulting data looks as following: pusha mov edi, [esp+32+4] ; ptr to output buffer call pop_packed pop_packed: pop esi edi> mov [esp+7*4], edi ; popa.eax <-- output ptr after unpack popa sub eax, [esp+4] ; return unpacked length retn in C/C++ to unpack data you need just to call that generated stuff, specifying output buffer; after call, unpacked length will be returned: void* buf = new char[ 1<<16 ]; int len = ((int(__cdecl*)(void*))&data_array)(buf); after data is unpacked, we should reorder it: int count = len / 6; for(y=0; y<6; y++) for(x=0; x> 4); outbuf[y+(x*2+1)*6] = (inbuf[(y*2+0)*(count/2)+x] << 4) | (inbuf[(y*2+1)*(count/2)+x] & 15); } and then, there are 6-byte entries in the outbuf, lets use 'em as following: DWORD a = 0, d, ptr = 0; while(count--) { a += (outbuf[ptr+0]<<24) | (outbuf[ptr+1]<<16) | (outbuf[ptr+2]<<8); d = (outbuf[ptr+3]<<24) | (outbuf[ptr+4]<<16) | (outbuf[ptr+5]<<8); d += a - 1; if (ptr > 0 && a == 0) break; fprintf(f,"min ip = %u.%u.%u.%u, max ip = %u.%u.%u.%u\n", a>>24,(a>>16)&0xff,(a>>8)&0xff,a&0xff, d>>24,(d>>16)&0xff,(d>>8)&0xff,d&0xff ); ptr += 6; } Refer to sfx.cpp to see how nibble reordering can be done on-the-fly. There also some old badly coded tools, like convert.cpp - to merge all databases into one and fix all found errors. compact.cpp is kind of grep, it will search generated database for given countries and produce binary file containing ip range lists ready to be compressed. expand.cpp performs reverse operation, it produces text file from given binary file with ranges list.