Ú¿ ÃÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ´ ÃÅÅÅÅÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÅÅÅÅ´ ÃÅÅÅ´ Microsoft .NET Common Language Runtime ÃÅÅÅ´ ÃÅÅÅ´ Overview ÃÅÅÅ´ ÃÅÅÅ´ by Benny/29A ÃÅÅÅ´ ÃÅÅÅÅÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÅÅÅÅ´ ÃÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ´ ÀÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÙ Whole computer world talx about new platform from Microsoft, called .NET. Current implementation (January 2002) is also known as .NET FrameWork SDK which provide us base services. And also, there exists 2nd beta-version of Microsoft Visual Studio.NET, new visual environment where you can create your own .NET applicationz. I wrote this article becoz I couldn't sit and wait until someone will make a research instead of me. Well, you know me, I'm pretty curious guy and so I investigated on my own... Well, what's MS.NET? ...................... Microsoft .NET is new component-based platform-independent architecture. The definition of .NET could take some kilobytez of plain text so I will try to enshort it as much as possible and show you only the most important thingz. .NET FrameWork SDK is set of applicationz that provide us the functional part of .NET in our non-dotnet operating system. Full .NET OS will be released as the second successor of WinXP. However, Microsoft triez to propagate their new platform same way as they did with Win32 - in the past they released Win32s, limited Win32 environment for Win16 systemz. Today, we have .NET FrameWork SDK - compared to .NET architecture - the functionality is on the same level as Win32s compared to full Win32. .NET FrameWork SDK provide us the perl - .NET Common Language Runtime environment. It's the heart of .NET discussed in this article. Let's look at the history ........................... In the past the application's component creation process was fucking hard. If you coded something in C++, the component couldn't be used within VisualBasic and vice versa - that's becoz application written in one specific language used another runtime environment than the application written in another language. Microsoft created many technlogiez to break this problem down - DDE, OLE, COM (1.0, DCOM, COM+) ... Let's stop at COM - that time it used to be very effective and successful technology, widely used among all HLL programmerz. But there was small problem: In C++ it was (and still is :-) pretty difficult to use COM components. And in VisualBasic it was very easy, but it couldn't use all benefitz of COM technology. The co-working of applicationz written in different languagez was always problematic. Until now. "Brand-new" architecture .......................... Microsoft again stoled know-how from otherz, advanced his own technlogiez and here is it - MS.NET ;-) In .NET environment you have only ONE type of code written in MS's own intermediate language (metacode, pseudo-code, something Java like classez) and ONE runtime environment (CLR) and library (Frameworks). Microsft .NET Common Language Runtime ....................................... All servicez are provided via common model which you can access by all .NET languagez. Oppositely you can write your own servicez which can be used by all .NET langz then. It's becoz the fact that you can use whatever .NET language but lately it will be compiled to language independent meta-code. Such programz are then just-in-time recompiled to executable code by CLR - in fact whole .NET basicz are something like Java-OS, nothing more. Native Languagez .................. I won't surprise anyone if I say that both of C++ and VisualBasic can be used in .NET architecture. Also JScript can be used as native language of .NET. Microsoft also created two NEW languagez, C# and J# (pronounced as "C sharp" and "J sharp"): 1) C# is new, very effective native .NET language which allows us coding .NET applicationz. When I studied this lang., I had really fun. Microsoft tried to create some kind of "super-language" - it's very similar to C++ and includez benefitz of many other languagez (VisualBasic, Java, etc..). I have to say, that if I wouldn't remind its the source-code-read-unability "featurez" (which is now very wide-spreaded feature at every lang., such as C++, Java, Perl...) Microsoft created one of the best languagez ever. Imho. 2) J# is language created for Java programmerz. J# is VERY similar to Java and is designed to help Java programmerz to easily port their applicationz to .NET environment. Some other already known languagez (e.g. Perl) should be by the time also ported to .NET. Assemblies ............ Well, the assembly is now new name for the program. It seemz Microsoft instinctively triez to fight against all standardz, remember? : Program->Application, Directory->Folder, Link->Shortcut, etc etc.. ;-) But assembly aint just normal EXE-file - it's whole application complet. Assembly containz all needed informationz used by program. You can imagine it like EXEcutable file, COMponent DLL/OCX, TypeLiBrary, RESources etc glued together into one executable file. It containz all meta-codez, dataz and other infoz needed for execution of the program. Every assembly also containz the list of all componentz stored inside it - called MANIFEST. Microsoft Intermediate Language ................................ Here I shortly described all needed internal informationz of .NET CLR. Now I will try to show you something from the meta-code (aka MSIL). I created this very simple program in C# (sample.cs): <------------------ cut ------------------> using System; class Sample { public static void Main () { Console.WriteLine("Hello world"); } } <------------------ cut ------------------> Then I compiled it using command: CSC sample.cs Compiler created new file sample.exe. If you will open it in some viewer, you will recognize that it ain't standard PE file. There's no code and it containz some "weird" structure. Yeah, it's compiled to MSIL PE file. Now we will use ILDASM.EXE utility to disassemble it back to IL source code. We will see in the tree view its structure: <------------------ cut ------------------> ___[MOD] D:\!!work\.net\sample.exe | M A N I F E S T |___[CLS] Sample | | .class private auto ansi beforefieldinit | |___[MET] method .ctor : void() | |___[STM] method Main : void() | <------------------ cut ------------------> If you will remove all not-needed informationz (commentz, resourcez, junk code...) you will get this (sample.il): <------------------ cut ------------------> assembly extern mscorlib { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) .ver 1:0:2411:0 } .assembly sample { .hash algorithm 0x00008004 .ver 0:0:0:0 } .module sample.exe .imagebase 0x00400000 .subsystem 0x00000003 .file alignment 512 .corflags 0x00000001 .class private auto ansi beforefieldinit Sample extends [mscorlib]System.Object { .method public hidebysig static void Main() cil managed { .entrypoint // Method begins at RVA 0x2050 // Code size 11 (0xb) .maxstack 8 IL_0000: /* 72 | (70)000001 */ ldstr "Hello world" IL_0005: /* 28 | (0A)000001 */ call void [mscorlib]System.Console::WriteLine(string) IL_000a: /* 2A | */ ret } } <------------------ cut ------------------> Yeah, you see that? This is how the metadata loox. It loox pretty simply and I'm sure that normal advanced asm-coder doesn't need any comment. Maybe someone will ask what's that "mscorlib" thing - in fact it's the DLL stored somewhere on your disk which containz all base class librariez. Now, when we removed all un-needed stuff added by MS's compiler we will use ILASM.EXE utility to create back the MSIL EXE file: ILASM sample.il Now we have 2 kilobytez of pure "Hello World" program :) What can we do now? Let's explore its structure. So, what makes normal PE EXE file the MSIL one? Let's look at that: <------------------ cut ------------------> File type: WINDOWS EXECUTABLE File Header: Machine: 0x014C (Intel x86) Number of Sections: 2 Time Date Stamp: 0x3C3886C7 (6. 1. 2002 18:17:59) Pointer to Symbol Table: 0x00000000 Number of Symbols: 0 Size of Optional Header: 0x00E0 (224) Characteristics: 0x010E File is executable. Line numbers stripped from file. Local symbols stripped from file. 32 bit word machine. Optional Header: Magic: 0x010B Linker Version: 6.00 Size of Code: 0x00000400 (1024) Size of Initialized Data: 0x00000200 (512) Size of Uninitialized Data: 0x00000000 (0) Adress of Entry Point: 0x0000229E Base of Code: 0x00002000 Base of Data: 0x00004000 Image Base: 0x00400000 Section Align: 0x00002000 File Align: 0x00000200 Operating System Version: 4.00 Image Version: 0.00 Subsystem Version: 4.00 Size of Image: 0x00006000 (24576) Size of Headers: 0x00000200 (512) Checksum: 0x00000000 Subsystem: 0x0003 (Windows character subsytem) DLL Characteristics: 0x0000 Size of Stack Reserve: 0x00100000 (1048576) Size of Stack Commit: 0x00001000 (4096) Size of Heap Reserve: 0x00100000 (1048576) Size of Heap Commit: 0x00001000 (4096) Loader Flags: 0x00000000 Number of Rva and Sizes: 16 <------------------ cut ------------------> The header loox normaly. Let's go down... <------------------ cut ------------------> Data Directory: Export directory: VA: 0 Size: 0 Import directory: VA: 0x00002250 Size: 0x0000004B (75) Resource directory: VA: 0 Size: 0 Exception directory: VA: 0 Size: 0 Security directory: VA: 0 Size: 0 Base relocation table: VA: 0x00004000 Size: 0x0000000C (12) Debug directory: VA: 0 Size: 0 Architecture-specific data: VA: 0 Size: 0 RVA of global pointer: VA: 0 Size: 0 Thread local storage directory: VA: 0 Size: 0 Load configuration directory: VA: 0 Size: 0 Bound import directory: VA: 0 Size: 0 Import address table: VA: 0x00002000 Size: 0x00000008 (8) Delay load import descriptors: VA: 0 Size: 0 COM run-time descriptor: VA: 0x00002008 Size: 0x00000048 (72) (unknown directory entry): VA: 0 Size: 0 Import Table: mscoree.dll Import Adress Table: 0x00002000 Import Name Table: 0x00002278 Time Date Stamp: 0x00000000 Index of first forwarder reference: 0x00000000 0x00002280 0 _CorExeMain Section Table: Section Header #1 Name: .text Virtual Size: 0x000002A4 (676) Virtual Address: 0x00002000 Size of Raw Data: 0x00000400 (1024) File Pointer to Raw Data: 0x00000200 File Pointer to Relocation Table: 0x00000000 File Pointer to Line Numbers: 0x00000000 Number of Relocations: 0 Number of Line Numbers: 0 Characteristics: 0x60000020 Section contains code. Section is executable. Section is readable. Section Header #2 Name: .reloc Virtual Size: 0x0000000C (12) Virtual Address: 0x00004000 Size of Raw Data: 0x00000200 (512) File Pointer to Raw Data: 0x00000600 File Pointer to Relocation Table: 0x00000000 File Pointer to Line Numbers: 0x00000000 Number of Relocations: 0 Number of Line Numbers: 0 Characteristics: 0x42000040 Section contains initialized data. Section can be discarded. Section is readable. <------------------ cut ------------------> We got it! In a brief look at the structure we can figure out that MSIL PE file consists in fact from ONE needed section ".text". Program importz only one API called "_CorExeMain" and entrypoint is set to 40229Eh: :0040229E FF2500204000 jmp [SAMPLE.00402000] and that's the jump to that function stored inside MSCOREE.DLL. We can consider this a .NET execution engine dispatcher which is called on every start of program and executez the meta-code. First 8 bytez of ".text" section are reserved for virtual address of _CorExeMain API and next 72 bytez for CLR header. First byte of CLR header is the size of it (72 bytez). Then follows some other data recordz (flagz, version of compiler, entrypoint token) and after that there is data directory, such like in PE header. Valid entriez are Metadata directory, Resources directory, Strong Name signature, CodeManager table, VTableFixups directory, Export Address directory and Precompile header. In the directory there is stored RVA of the entry and the size of it. The metadata usualy startz at 80th byte (402050h) - first record in the data directory (402010h) is the metadata and there is stored the right address of that. . Now follows speculationz and my personal guesses. I'm not sure about these thingz and only my experiencez tell me what it could probably be. If you have more exact informationz, pleaz GIMME KNOW! . It seemz that the first byte(z) (402050h) are some method/module specific recordz and then follows meta-dataz. I figured out that meta-data recordz are stored alwayz below the meta-code. 424A5342h ("BSJB") seemz to be magic number of the metadata directory and then follows some flagz and internal recordz (namez and tokenz of class itemz - methodz, atributtez, propertiez etc...). I hope that somewhere on the net will appear some documentation of MSIL/CLR internal formatz which will give us a better look at this thing. Closing ......... Well, that's all I could research for now. What can we do with these informationz? Imho Microsoft won't change shapez of opcodez and so we can research more on this. Also, with a little help of MSDN and .NET utilitiez we are able to code some kind of overwritting MSIL PE file virus, based on the informationz we gathered here. If someone will map the CLR header structure, I'm sure we won't wait long time and some first MSIL .NET appending virus will appear - if there exists some Java infectorz, why shouldn't exist some CLR virus? :-) As a bonus, in this 29A zine (6th issue) you can find worlds 1st .NET virus coded by me, called as .NET.dotNET aka Win32.Donut. Ok, that's all for now. If you will have some better informationz than me I hope you will inform me. Thnx! .................................................... . Benny / 29A . benny@post.cz . http://benny29a.cjb.net . ... perfectionist, maximalist, idealist, dreamer ...