Extracting data from multiple Active Directory Domains

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm

In this blog I will describe the steps which will simplify SSIS package design, when dealing with environments where data needs to be extracted from multiple Active Directory domains.  The approach assumes the use of SSIS Active Directory Source component, which I described here.

Since the AD Source component only extracts data from one domain at a time, the logical thing to do is to utilize Foreach Loop Container in order to iterate through all domains in a forest. One can easily adopt the proposed approach to deal with multiple forests if required.

Create table which will hold the names of the Domains to extract data from

In my case I will be iterating through the domains belonging to the same forest, hence I only will need to store the names of the domains. If I had a requirement to extract data from multiple forests, I would have to store additional information in this table (ex. credentials per forest).

SNAGHTML10ccddd9

As I am writing this, I am thinking that a more elegant approach might be to create a script task which would enumerate all domains in the forest and store them in an array variable which could later to be used by the Foreach Loop Container.

Create 2 Package Scope variables

domainNames will be populated with the names of the domains selected from the tblADDomains. Note that it must be of type System.Object.

domainName will hold a domain name during a loop iteration. Set the initial value for this variable to the root domain in your forest. I will explain the reason for this later.

SNAGHTML1103805c

Configure SQL Execution Task

The objective of this task is to select domainNames from the table we created and populated in the previous step and place the results into an array variable (domainNames).

image

Make sure to select “Full result set” to get all rows from the table.

SNAGHTML10dd7fe8

Under the “Result Set” section map result set 0 to the domainNames variable. Ensure to name the result set as 0 to avoid error message during the build process.

Configure Foreach Loop Container

image

image

 

image

Since tblADDomains contains only one column (ADDomain) the index for the variable mapping is 0.

In summary, the SQL Execute Task will populate the array domainNames, and the Foreach Loop Container will loop through each member of the array.  It would look something like this in C# foreach(string domainName in domainNames)

Configure Data Flow Task inside the Foreach Loop Container

1. Drag new Data Flow Task into the Foreach Loop Container.

image

 

2. Add Active Directory Source and ADO.NET destination components

3. Initially “hard code” DomainName property value. This is required in order for the component to validate itself and properly build its output columns by enumerating AD schema.

SNAGHTML10e737a3

4. Once the component is the validated state, switch to the Data Flow Task Properties and create a new expression

image

image

Now during each iteration of the Foreach Loop the AD Domain Source component will be re-configured with a new DomainName value.