What is an Assembly?

Article
04/02/2005

Every time you create any kind of application or service using Visual Basic.NET you are building an Assembly. That is because Assemblies serve as the basic building blocks of the .NET Framework. They help to define versioning, code reuse, scoping rules and security permissions for all .NET applications. Probably the simplest understanding is that an Assembly is just a collection of types that form a logical unit of functionality and that is built to efficiently work together. If you search your hard drive you will find Assemblies as either an executable (.exe) or a dynamic link library (.dll) that was compiled using the .NET Framework. In this article we will explore some of the basic concepts and architecture of Assemblies and how they can be used within the .NET Framework.

What makes an Assembly unique is that during the compilation process a set of additional descriptive information is included. This information or metadata contains both the type library and an additional set of information that other .NET applications need to access and use the Assembly. It is this additional information that contains the collection of metadata that describes what components the Assembly contains and how each of these elements are related. This additional information is stored much like a book’s table of contents in an Assembly Manifest. By default, the Assembly Manifest can either be stored in the executable file (.exe or .dll) with the Microsoft Intermediate Language (MSIL) code or in a standalone Portable Executable (PE) file.

Exploring Metadata

Every .NET compiler is required is both generate and embed the standardized set of metadata in the output file. A review of the metadata will show the types that are available within an Assembly. This includes the classes, interfaces, and their containing namespace. The actual Assembly metadata is generated by the .NET compiler from the source files. The compiler embeds the metadata in the target output file - .exe, .dll or a netmodule in the case of a multi-module assembly. In the case of the multimodule assembly every module that contains IL must have the metadata embedded in to describe the types in that module. .

Figure 1: The Solution Explorer for an Windows application.

For example, within a Windows application the AssemblyInfo.vb file, as shown in Figure 1 contains information about the assembly. This file includes the version number, dependencies and variety of Assembly specific identification information. When the application is compiled this information is added to the Assembly. NET reflection extensively uses the metadata information to know how to type information dynamically. For example, the following code example can be used to display a message box that contains the current application copyright information.

Dim objCopyright As AssemblyCopyrightAttribute= _ AssemblyCopyrightAttribute.GetCustomAttribute( _ System.Reflection.Assembly.GetExecutingAssembly, _ GetType(AssemblyCopyrightAttribute))
Dim appcopyright = objCopyright.Copyright
MsgBox(appcopyright)

It is important to remember that metadata describes the contents of the assembly, whereas the Assembly manifest describes the assembly itself. This includes providing the logical attributes shared by all the modules and all components in the assembly. The .NET manifest contains a cryptographic hash of different modules in the assembly. When an assembly is loaded by the CLR it recalculates the hash of different modules in the assembly and compares it to the embedded hash. If the hash generated at runtime is different from the one found in the manifest. The .NET framework will refuse to load the assembly and throws an exception.

The manifest is automatically generated from all source files of all modules within the assembly. The actually manifest is then embedded in only one physical file because it is common for all modules in an assembly. The metadata itself is actually embedded within all the included modules. The .NET CLR compiler automatically generates the manifest in a standard format. Using the manifest allows .NET to capture information about other referenced assemblies. This ensures the version capability, and the assembly are able to interact with the exact trusted set of other assemblies it expects. When an assembly is loaded .NET will guarantee that only those specific assemblies are used, and that only compatible versions are loaded. Table 1 shows the available information stored in the Assembly Manifest. It is important to remember that the assembly name, version number, culture, and strong name comprise the assembly’s unique identity.

Table 1: Information Stored in the Assembly Manifest

Information	Description
Assembly Name	A text string that identifies the assembly name
Culture	Information on the culture or language the assembly supports. This information should be used only to designate an assembly as a satellite assembly containing culture- or language-specific information. (An assembly with culture information is automatically assumed to be a satellite assembly.)
Version Number	A major and minor version number, and a revision and build number. The common language runtime uses these numbers to enforce version policy.
Strong name information	The public key from the publisher if the assembly has been given a strong name.
List of all files in the assembly	A hash of each file contained in the assembly and a file name. All files that make up the assembly must be in the same directory as the file containing the assembly manifest
Type reference information	Information used by the runtime to map a type of reference to the file that contains its declaration and implementation. This is used for types that are exported from the assembly.
Assembly reference list	A list of other assemblies that are statically referenced by the assembly. Each reference includes the dependent assembly’s name, assembly metadata (version, culture, operating system) and public key if the assembly is string named.

Metadata is stored in one section of a .NET Framework PE file, while the Microsoft Intermediate Language (MSIL) is stored in another section of the PE file. The metadata portion of the file contains a series of tables and heap data structures. The MSIL portion contains metadata tokens that reference the metadata portion of the PE file. Each of the metadata tables holds information about the elements of a program. For example, one metadata table describes the classes in your code, another might describe fields. If you have five classes in your code, the class table will contain five rows, one for each class. The metadata table references other tables and heaps. For example, the metadata table for classes’ references the table for its methods.

Metadata also stores information in four heap structures – string, blob, user string, and GUID. All the strings used to name types and members are stored in the string heap. For example, a method table does not directly store the name of a particular method, but points to the methods name stored in the string heap. Each row of each metadata table is uniquely identified in the MSIL portion of the PE file by a metadata token. These metadata tokens are conceptually similar to pointers, persisted in MSIL that reference a particular metadata table.

By default a metadata token is a four byte number. The top byte denotes the metadata table to which a particular token refers to. The remaining three bytes specify the row in the metadata table that corresponds to the programming element being described. When a program is compiled for the common language runtime, it is converted to a PE file that consists of three parts as shown in Table 2.

Table 2: The three parts of a PE file

PE Section	Contents of PE Section
PE Header	The index of the PE files main section and the address of the entry point. The runtime uses this information to identify the file as a PE file and to determine where execution starts when loading the program into memory
MSIL Instructions	The MSIL instruction that make up your code. Many MSIL instructions are accompanied by metadata tokens.
Metadata	Metadata tables and heaps. The runtime uses this section to record information about every type and member in your code. This section also includes custom attributes and security information.

One of the most important advantages of Assemblies is that they resolve many of the traditional issues associated with deployment. For example, being able to simply copy and run a .NET based application to a target computer.

What is a Namespace?

By definition a namespace is a unique identifier that allows the definition of an element to be unambiguously identified. One of their key uses is to prevent possible ambiguity and simplify the references when using a large group of objects, like a class library. Without namespaces there is the possibility of something known as namespace pollution. This is a development scenario in which a developer of a class library can run into a set of ambiguous assembly references or name collisions within a single class library. As a physical unit of deployment an Assembly can contain namespaces. These are used to organize the defined objects. Any assembly can contain multiple namespaces, which in turn can contain other namespaces. For example, the .NET Framework defines a Button class in the System.Windows.Forms namespace. We can declare a reference to this class using the fully qualified name as shown below.

Dim myButton as System.Windows.Forms.Button

Fully qualified names are object references that are prefixed with the name of the namespace where the object is defined. We can use objects defined in other projects if you create a reference to the class and then using the fully qualified name for the object in your code.

Fully qualified names automatically prevent naming conflicts because the compiler can always determine which object is being used. However, the names themselves can get rather long and cumbersome. In order to get around this, we can use the Imports statements to define an alias. An alias is an abbreviated name that can be used in place of a fully qualified name. For example, the following code creates aliases for two qualified names and uses theses aliases to define two objects.

Imports LBControl = System.Windows.Forms.ListBox``Imports MyListBox = ListBoxProject.Form1.ListBox``Dim LBC As LBControl``Dim MyLB As MyListBox

When using the Imports statement without an alias, you can use all the names in that namespace without qualification as long as they are unique in the project. If a project contains an Imports Statement for namespaces that contain items with the same name, you must fully qualify that name when you use it. Suppose for example, if your project contained the following two Imports statements.

Imports MyProj1 ‘ this namespace contains a class called Class1
Imports MyProj2 ‘ this namespace also contains a class called Class1

Any time you attempt to use Class1 without using the full qualification, you will produce an error stating that Class Name is ambiguous.

**Note**

Always keep in mind that you cannot define properties, procedures, variables, events at the namespace level. These items must be declared within containers such as modules, structures or classes.

For example, if we created a new class named Button, we would be able to use it inside this project without qualification. However, if we wanted to use the standard Visual Studio.NET Button class in the same project, we would be required to use a fully qualified reference to make the reference unique. If for any reason the reference is not unique, the .NET Framework throws an error because the name is ambiguous. For example the following code declares two objects based on the Button class.

‘ define a new object based on our Button class.
Dim myBut as new Button
‘ define a new Windows.Forms Button control
Dim MywinButton as new System.Windows.Forms.Button

By default every executable file that is created contains a namespace with the same name as your project. For example, if you define an object with the project names ButtonProject, the executable file, ButtonProject.exe contains a namespace called ButtonProject.

Multiple assemblies can use the same namespace. They are automatically treated as a single set of names. For example, you can define classes for a namespace called TestProjectSpace in an assembly name Assembly1, and define additional classes for the name namespace from an assembly called Assembly2.

Exploring the GAC

By default, the Assembly manifest stores an Assembly reference list. This is a set of references that contain a list of external dependencies that include references to both global and private objects. Within the .NET Framework, a global object resides in the Global Assembly Cache (GAC). The GAC is a central repository of assemblies and is specifically designated to be shared by several applications on the local computer Very similar to the traditional System32 directory that was store to store Windows system files. For example, the Microsoft.VisualBasic namespace is an example of an assembly stored in the GAC. On the other hand private objects must be stored in a directory at either the same level or below the application installation directory.

A private Assembly is used only by a single application, and is stored in that application's directories. On the other hand, a shared Assembly is one that can be referenced by more than one application. In order to share an assembly, the assembly must be explicitly built for this purpose by giving it a cryptographically strong name (referred to as a strong name). By contrast, a private assembly name need only be unique within the application that uses it. In making a distinction between private and shared assemblies, this introduces the idea of sharing as an explicit not an implicit development decision. Simply by deploying private assemblies to an application directory, you can guarantee that that application will run only with the bits it was built and deployed with. Also, this means that references to private assemblies will only be resolved locally to the private application directory.

There are many different reasons to build and share assemblies. These include the ability to implement a version policy. The simple fact that shared assemblies have a cryptographically strong name means that only the author of the assembly has the key to produce a new version of that assembly. This means that you make a policy statement that says you want to accept a new version of an assembly. The strong name provides the confidence that version updates will be controlled and verified by the author. Otherwise, you don't have to accept them.

For locally installed applications, a shared assembly it typically explicitly installed into the Global Assembly cache. Key to the version management features of the .NET Framework is that downloaded code does not affect the execution of locally installed applications. Downloaded code is put in a special download cache and is not globally available on the machine even if some of the downloaded components are built as shared assemblies.

**Note**

One this to remember is that the base classes that ship with the .NET Framework are all built as shared assemblies

As a general rule it is always a good idea to keep assembly dependencies private and store them in the application directory. Additionally, it isn’t necessary to install assemblies into the GAC to make them accessible to Com Interop or unmanaged code. There are three basic methods for deploying an Assembly into the GAC.

Use an installer designed to deploy the file directory to the proper directory.
Use the command line tool Global Assembly Cache tool (GACUtil.exe) provided within the .NET Framework SDK.
Use Windows Explorer to drag assemblies into the cache.

**Note**

When deploying into a production environment it is always a best practice to use an installer built to specifically install into the GAC. Both using Windows Explorer and the GACUtil.exe do not provide assembly counting and other features provided by the Windows Installer.

Often System Administrators protect the WINNT directory using an access list (ACL) to control write and execute access. By default the GAC is installed in the WINNT directory and inherits the default ACL. Especially on developer machines it is always a good idea to allow Administrators the ability to delete files from the Global Assembly Cache.

Exploring Strong Name Assemblies

Assemblies can either contain a strong name of a simple name. Assemblies deployed into the GAC must always have a strong name. By default, when an assembly is added to the GAC integrity checks are performed on all files that make up the assembly. The cache performs these integrity checks to ensure that an assembly has not been tampered with. A strong name consists of the assembly identity. This identity consists of the simple text name, version number and culture information and a public key and a digital signature. These are generated from an assembly file using the corresponding private key. Both Visual Studio and the .Net Framework SDK enable the creation of a string name using the Sn.exe tool. This command line tool provides options for key management, signature generation, and signature verification. Assemblies with the same strong name are expected to be identical. Also, signing an assembly will always guarantee that a name is globally unique.

Signing an assembly with a strong name provides the following benefits:

Strong name automatically guarantee uniqueness by relying on unique key pairs. No one is able to generate the same assembly name, because an assembly generated with one private key has a different name than an assembly generated with another private key.
Strong name protect the version lineage of an assembly. A strong name can ensure that no one is able to produce a subsequent version of your assembly. Application users are ensured that a version of the assembly they are loading come from the same publisher that created the version the application was built with.
Strong name provide a strong integrity check. This check guarantees that the contents of the assembly have not changed since it was built.

**Note**
It is important to remember that strong names do not imply a level of trust like that provided by a digital signature and supporting certificate.

There are a variety of benefits that are gained when referencing a strong named assembly. The most important as we discussed are versioning and naming protection. One thing to remember is that if when a strong named assembly references an assembly with a simple name, you will automatically loose the benefits that are derived from a strong named assembly. Therefore, strong named assemblies are able to only reference other strong named assemblies.

Assemblies are one of the most important concepts within the .NET Framework and something that all developers should understand. This article was designed to provide a basic overview of Assemblies and their basic functionality.

What is an Assembly?

Exploring Metadata

What is a Namespace?

Exploring the GAC

Exploring Strong Name Assemblies

Additional resources