ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Xine - issue #5 - Phile 101 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ elements of X86 assembly (First step) +------------------------+ This text was originally written to help by friend billy in his (crazy) path of understanding the intel processor. Here we will try tp analyze deeply the binaries of intel instructions themself. So, how are things done ? +-------------------------+ The intel instructions set is a couple of instruction, represented by binary value of 8, 16, 24 bits (may be 32, couldn't find any, what can't we except from intel crazyness) ? There's no logical links between all these instruction to their binary code. So, you have to do as everybody. But, we could make classes of instruction, depending their arguments. The binary have fixed value, and then we can add some value to switch the processing of the instruction. They are instruction flags: ie: ADC: 0001001woorrrmmm There a table you can get at the following adress: http://www.imada.ou.dk/~jews/PInfo/intel.html (*) (*) Look errata over this table The instruction himself +-----------------------+ The instruction himself can have 6 different ""flags"" that will be handled, often just 2 or 3 are used simultaneously. These flags are directly put into the instruction binary code. Each instruction have them stored differently, the big majority have it done the same way (as exemple, xor sub add have the same prefix) but there some little bastards. the w flag +----------+ The w flag is important for the instruction operand, he define if the register argument is a byte or a word. (0 for 8 bit, 1 for 16/32 bit) ** word_value may be a 16 bit value, in fact it depends the codec you are working on, as exemple, the word value will be in a 16 bit segment a 16 bit value, but on a 32 bit segment it will be a 32 bit value. (** Also, see prefixes and utilisation) The o flag +----------+ The o flag set the primary function of argument, it's a double bit flag, possibility are: * 00 = dword ptr [memory_register] (note that if memory_register = 110 (ebp code) ,it become dword ptr [hard_coded_address](*) * 01 = dword ptr [memory_register+8_bit_value] * 10 = dword ptr [memory_register+word_value] * 11 = directly register ** word_value may be a 16 bit value ? No, it depends the codec you are working on, as exemple, the word value will be in a 16 bit segment a 16 bit value, but on a 32 bit segment it will a 32 bit value. (** Also, see prefixes and utilisation) so, to have a mov eax,[ebx] for exemple, you have to set the oo flag to 00. The register flag +-----------------+ The register flag represent the register operands, it's 3 bit they can be: eax = 000 ecx = 001 edx = 010 ebx = 011 esp = 100 esi = 101 edi = 110 ebp = 111 The memory flag +---------------+ The memory flag is quite the same as the register flag, except: if the memory flag equal esp, and the o flag is not 11, it means you use an advanced register. When you are using an advanced register, the instruction is directly followed by a byte, here's the decryption of the byte AA BBB CCC dword ptr [ccc + bbb * (aa^2)] CCC is eqaul to the adressing register BBB is equal to the evaluation register AA is equal to the multiplication byte if AA = 00 and BBB = 100, then there's no evaluation register. The S flags +-----------+ The S flag is not very common, but appears sometime in instruction like push ds or push es, it's a 3 bit value that have the following value: ES: 000 CS: 001 SS: 010 DS: 011 FS: 100 GS: 101 an exemple: push Segment is equal to 0000111110sss000 you remplace sss by a given value, easy as bonjour :) The Conditional flags +---------------------+ The conditional flag is a 4 bit value, here's the handled conditions by the processors: 0000 = O 0100 = E/Z 1000 = S 1100 = L/NGE 0001 = NO 0101 = NE/NZ 1001 = NS 1101 = GE/NL 0010 = C/B/NAE 0110 = BE/NA 1010 = P/PE 1110 = LE/NG 0011 = NC/AE/NB 0111 = A/NBE 1011 = NP/PO 1111 = G/NLE The Prefixes themself +---------------------+ before some instruction, you can specify the destination and/or the way the instruction have to be processed. Here's a small description: 066h you switch from word to dword and from dword to word 067h you switch from word to dword in memory accompagnment ie: mov eax,[ebx+memory16] 02Eh destination is to CS segment 03Eh destination is to DS segment 026h destination is to ES segment 036h destination is to SS segment 064h destination is to FS segment 065h destination is to GS segment The following value +-------------------+ Now the following value may be: 1ø) fixed by the instruction himself, it can be a Immediate in 8 16 or 32, 2ø) it not specified, you have to look the operand to find the instruction size and to check the prefix We want an exemple ! +--------------------+ All this theorical shit overload your brain, okay, we will make a small exemple to show you that's not that difficult. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Exemple, you take this: 8B 4C 24 04 1ø) Prefix analisis, no prefix 2ø) Instruction opcode detection 8B 4C = 10001011 01 001 100 ^ ^ ^ ^ w o r m but MOV Reg, Mem is equal 1000101000000000b what you can notice is that bit 8 if the w flag, there's also the o the r and the m flag. so if you zero them it from 8B4C, you finally obtain 8A00h, and it's the MOV Reg, Mem opcode now, if you isolate the w flag, you notice it's 1 -> we deal with 32 bit register now, if you isolate the o flag, you notice it's 01 -> we deal with a memory destination liek [reg+8_bit] now, if you isolate the r, you notice it's 001 -> the instruction is MOV ECX, blabla now, if you isolate the m, you notice it's 100 -> extended value! so you take the next byte, it's 24h it's basically 00100100h so it should be [esp+esp*(2^0] but this doesn't exist for intel, he interprete it as [esp]. Remember, you have to add the 8_bit, it's equal 04 now, you look if there was a value to add, the operand are a register and a memory location, no Imediate, so you pass this point. so, the instruction was mov ecx, [esp+4] ---------------- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ An other exemple: 8D 5D D4 8D is not a prefix, so 8D is equal 10001101b and lea is 10001101oorrrmmm clean the 8D5D of the o r m flags and then compare, you will have 8D00 so, take a look on the oo rrr mmm 5D = 01 011 101 o = 01, 8 bit displacement r = 011, it's equal EBX s = 101, it's equal EBP the 8 bit displacement is equal to D4 so, the instruction was: lea ebx,[ebp+d4] +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ As last exemple, we will take this: 66 81 03 65 65 1ø) Analyzing the prefix, the instruction are muted from 32 to 16 bit type, what previously was a dword is now a word! 2ø) 81 03 = 1000 0001 0000 0011 , if you look the ADD instruction in ^ ^ ^ w o m the table, and then clean the wom flags of 8103, you will finally find this: 1000 0000 0000 0000, the ADD Mem, Imm8 3ø) the w flag is equal to 1 that we deal with word 4ø) the o flag is equal to 00 , we are dealing with a memory destination [register] 5ø) the m flag is equal to 011 , the memory destination is [ebx] 6ø) checking the size of the operand that follow the opcode it's 1ø) a word (coz o flag is 00) 2ø) word mute from 32 bit into 16 bit value now, we discussed about operand size, it's 16 bit, equal 6565 7ø) that's all folks finally we find that the instruction was: add word ptr [ebx],6565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ so, for the rest, I let you experience by yourself, remember: scienca vincere tenebrae! Now you wanna know the instruction size ? +-----------------------------------------+ Here's how you have to proceed: 1ø) Check if prefix exist 2ø) Recognize instruction, get his size 3ø) Check Operand, if there's a destination 3 bisø) Extended register ? Yes, Increment instruction size 3 tricø) Followed by a value ? Yes, calculate his size (Pay attention to the prefix) 4ø) With operand check and prefix analyzed, and instruction recognized you can proceed to add the size of the following value. so, look how is stored an instruction [Prefix?] [Instruction] [Extended memory?] [Add to memory pointer?] [value?] all the ? are to check, they may exist, may be not. It's your job to do that, now, go ahead! Errata over the table +---------------------+ This table is really usefull for a programmer, so I processed it into an include file almost readable by an assember (good for vx!) so, if you wanna use this, no problem, look the 8086.inc to see how are stored the datas. This table had a few bugs, a big one for exemple is the fact of having set that operand reg,reg are the same as mem,reg. It's in fact true when compiling, it's totally false when decompiling. Anyway, enjoy! Check the table and you can see there's 2 different manner to do a push edx. push register and push memory (switched to register)