Importing data into Gemini

This time around we will take a closer look at ways of getting your data into Gemini model. In particular we will into importing data using “conventional” ways. Using variety of OLEDB/ODBC/.Net client libraries.

Choosing Data Source

To import your data click on the “From Database ” in the Gemini Client Window and enter Import Wizard.

clip_image002

Here you see long pre-populated list of data sources you can import from. If you don’t see your favorite data source, you can click choose “Others” and Import Wizard will let you build your own OLEDB connection string.

Click on the “Build” button in the “Specify a Connection String” page and Import Wizard would launch Data Link Properties dialog. Click on the “Provider” tab in the dialog and you will see even longer list of OLEDB providers registered on your machine:

clip_image003

You can then choose any of the providers, but for the interest sake choose a “Microsoft OLE DB Provider for ODBC Drivers”.

Using this option you could then import data into Gemini model using any of the ODBC drivers you have installed on your machine.

clip_image004

But what about .Net providers? There isn’t a link or any other high level selection you make. The choice for particular provider type is accessible as advanced option once you’ve selected the type of data source.

Let’s go back to the first page of the Import wizard and choose “Microsoft SQL Server” as a source. Click next and in the next page click on the “Advanced” button right below the Database name. clip_image005

The Advanced properties dialog is available for you to control fine details of your connection. You can pick and choose particular provider type from the Providers drop down box. You can change any of the provider properties in the properties window below. You can also see the complete connection string used to establish connection in the box below. If you see empty property box like this:

clip_image006

This means that you don’t have particular provider installed. Go back and install the provider you would like to use.

Choosing data to import

Once you’ve selected a datasource you would see a choice to import data by ether selecting list of tables or by writing your own SQL query.

In CTP2 Gemini Import Wizard does not give you a SQL Query builder. You simply get a text box where you can either write or paste a query you’ve constructed elsewhere. Gemini Import Wizard will try to validate the query and will not let you proceed if you have a syntax error.

If you choose to go with Tables selection, you would see a “Select Tables and Views” page

clip_image008

Here you can pick one or many tables. “Select Related Tables” will cause Gemini Import Wizard to select all tables related to the once you’ve already selected if this information is present in the data source. Import Wizard will also remember table relationships and create these between tables in the Gemini model.

You can also preview, explicitly select the columns, and filter content of particular table

clip_image010

Don’t worry if you don’t see entire content of your table- Preview dialog is only showing the first 50 rows using something like “Select Top 50 <columns> from <table>”.

This dialog also lets you to delete any columns you don’t need and set filters to limit amount of data you’re bringing into Gemini.

It is very good idea to restrict the import to just the data you need. Large number of unnecessary columns or rows will take up unnecessary space in the Gemini model.

Once you’ve made your choice, the last page – Import Summary, lets you review your choices. Just in case you forgot something and need to go back before import starts.

clip_image012

Clicking on the finish button would bring up the Progress Dialog and would start actual import of the data.

clip_image014

Here you see number of rows imported per table and if any errors occurred during the import you could see an error message by following a link that appears in the Message column. All tables are imported serially and speed of data import largely depends on the ability of your data source to deliver the data. While importing large tables you would see progress showing count of rows imported (in increments of 10,000).

The last row in the progress report – “Data preparation” is the stage where Gemini creates relationships between tables.

If you forgot something or import takes too long, you can cancel import. In this case all tables that show “Success” will be imported and any following tables would show error. Also the ability of Gemini import wizard to cancel import operation might be dependent on ability of the relational backend to cancel the query Gemini sends to it. Imagine you’ve constructed complex SQL query to bring data into Gemini and it takes awhile for relational database to process it. If you try and cancel out of import operation, the Gemini Import Wizard would forward cancel request to the relational database but that might take longer to respond, keeping you waiting.

Conclusion.

We discussed various details on how you can import data from various data sources into the Gemini model. If you’ve worked with Excel or other data modeling tools – data import in Gemini should be quite intuitive. Some of the concepts of the importing data are still little rough. But this is the first CTP and we will try to improve on data import experiences further. In the meantime, if you have any specific data import questions or observations, please give us feedback on the Gemini newsgroups.