Importing & processing data in SQL Server Tables
To completely understand the next phase of import process we must first have some additional information. There are a few stored procedures that are registered in the server’s registry that are invoked during various stages of importing data from the gatherer pipeline. These can be found in the following registry hive.
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Applications\<GUID>\Gatherer Archival Plugin\ProfileImport
The most important of them are the following:
Refer to Step by step process explained below which explains job done by these Stored Procedures.
This phase is where we “crawl” the data. During this phase, the data is pushed into various temporary tables in the SSP database. The following tables are written to, or updated during this phase.
1. The stored procedure profile_pluginOnStartCrawl is invoked. This stored procedure basically updates the Profile_Stats table with the start time, import status & import type (Full/Incremental) which is used in updating the UI with information such as count of profiles imported and whether we are in a full crawl or not. Once we start profile import the UI status show as “Enumerating”
2. Profile import uses protocol handler used to query data from Active directory & feed the Gatherer Pipeline.
3. The data currently in the gatherer pipe has been processed by the iFilter. It is now handed over to the Archival and Retrieval plug-in. The ARPI plug-in reads this data and writes in chunks to ProfileImport and ProfileImportAlt temporary tables alternatively. Both these tables have the same structure. The table being currently written is called Active Buffer.
4. Profile_PlugginDataImport stored procedure truncates the ProfileImport_Copy table . It then copies active buffer into ProfileImport_Copy . Thus this table holds a subset of data stored in ProfileImport or ProfileImportAlt tables and is used to perform various operations such as evaluating the properties of existing user profiles, updating the signatures (Signatures hold specific user property information in a hashed format).
5. Once we have the records in ProfileImport_Copy we try to find which records need to be updated. If a profile exists, a subset of the profile property is matched to decide if there are any changes.
a) If matched, then only the LastImported field in UserProfile_Full table is updated.
b) If not we move on to Update the record.
6. We continue using ProfileImport_copy to process the properties and occasionally write into Profile_DataImport table. The Profile_DataImport is just another temporary table that we use to process changes with the imported data before we finally write to the UserProfile_Full table. This way we also minimize write lock times while we update the final data store for user profiles – UserProfile_Full
7. The next step is the final step in the profile import cycle. After we have finished processing, we will truncate the active buffer, ProfileImport_copy, and Profile_DataImport so that they are clear for the next run.
8. At this stage, the stored procedure Profile_PluginOnEndCrawl is executed, which updates the Profile_Stats table which indicates the crawl is complete. In the UI, this will reflect as Idle against “Profile import status” setting and the final statistics of the import is displayed in the bottom of the Web page. At this point, profile import is completed and the process to update Membership statistics will start. You should see this as “Enumerating” in the user interface against the “Membership & BDC import status” setting.
Key SQL Server Tables Used in Import Process
As discussed above, all the user profile information that is retrieved from the Active Directory store is ultimately written into the UserProfile_Full table located in the SSP database. Each row in the UserProfile_Full table represents a user. The following are some of the useful information which is written into the UserProfile_Full table:
Primary Key. The record identifier of the user profile
The SharePoint document identifier associated with the user profile.
GUID which provides a universal identifier for the user profile.
SAM account name of the user’s logon account.
Display name of the user
Session Initiated Protocol address
Last Date & time when this record was updated.
Last date & time user updated data in this record.
Last Date & Time user profile data was updated during import process
Flag to indicate if the user has been marked for deletion & is no longer active
A MD5 hash that represents properties of the user.
* NOTE: The Signature is checked to see if the value matches one that we already stored. If a match is found, we do not update the properties for that user.
There is another table that also stores the user profile information. This table is UserProfileValue. This table contains all the user profile property values. The following is the structure of the UserProfileValue table
A 64-bit integer which specifies the record id of the user profile this property is associated with
A 64-bit integer which specifies the identifier of the user profile property
A variant value which is the value of the user profile property.
A string associated with the user profile.
A string that specifies a database operation for a property in a user profile record Add/delete etc
An integer which specifies the order of display of the user profile property in the user interface
A privacy type value of the user profile property.
The RecordID links the UserProfile_Full table and UserProfileValue tables together. The UserProfile_Full holds the basic user profile information while the UserProfileValue holds additional information such as Image etc.
Timer Jobs in User Profile Import
The user profile import begins with a timer job that is part of the Shared Services. Therefore the user profile import starts within the OWSTimer.exe process running on the host which is also the indexer (Remember that User Profiles is dependent on the crawler component). As soon as a full import is scheduled, a one-time, timer job is created. You cannot view this timer job in the Central Administration’s Timer Job Status page because it is hidden. Shared Services scoped timer jobs are usually hidden because there is nothing you can configure for these timer jobs. To view the names of the SSP scoped timer jobs, you can use the following command:
Stsadm –o enumssptimerjobs -title “Your Shared Service Name”
The output of this however, will not provide you with the job schedule as you can see from the above. To view the job schedule for these timer jobs, you can use the SQL Query as given below against the SSP Database.
SELECT DisplayName, Recurrence, NextDueTime, Disabled FROM MIPScheduledJob (NO LOCK)
CAUTION: It is unsupported to directly query the database or make any modifications. Refer this link:
Note: A 0 in the Disabled column indicates False and 1 indicates true.
For more information on the SSP time jobs, please refer this MSDN link and the section SSP timer jobs
The import log contains information about the success or failures that occurred during the user profile import. As mentioned earlier, we use the crawler component for the user profile import. Thus to view the import logs, we use the Crawl log. You can access it as follows:
· Shared Services Administration: Shared Service Name > User Profile and Properties
· Click on View Import Log
The following is a picture of a successful user profile import log. From the highlighted portions, you can see that the content source is PEOPLE_DL_IMPORT. In the URL section, you also see the domain for which the crawl occurred.
How is user profile information updated? For eg, if a user is removed from active directory when does it get updated in SharePoint databases?
Answer: You must schedule a full import at regular intervals so that users who have been deleted or updated in the data source are removed from the user profile database. Incremental import only adds those user accounts that have been added since the last full import.
What happens to the MySite of a user after that user has been removed from the Active Directory? Is the site removed automatically or does it continue to exist.
Answer: The ownership of the Mysite will be assigned to the manager of the user. A timer job called Mysite Cleanup will later remove this site from the database.
What account is used to access the content from the data store – Active Directory?
Answer: This is a configurable setting. By default, The default account will be used for connections that do not specify an import access account. When choosing Use Default Content Access Account verify that the account has access to the source. It is recommended that you specify an account rather than relying on the default content access account.
What happens if we remove a user profile from SharePoint? Does it import it again automatically or does it need to be an explicit import?
Answer: It will be automatically imported on the next run.