Case Study: Import data from XML to SQL (Part-2)

Hello again…

In Case Study: Import data from XML to SQL (Part-1) we tried to save XML data to SQL using c# .net code

In part one we took care of creating tables, columns, assigning appropriate data type, creating xsd file at run time and bulk insert data in SQL

 

In this part we are going to take care of

1)      Duplicate nodes in XML file

2)      Creating new XML file with replaced node names

3)      Enumerate through all XML files in given folder

4)      Get how many rows are there in database and decide next ID value accordingly

 

-> Resolve Duplicate nodes in XML file by renaming nodes.

If you have multiple nodes with same name then we will get error

{"A column named '<column name>' already belongs to this DataTable: cannot set a nested table name to the same name."}

 

So to resolve this we have to rename this node in XML file and append name of it’s parent to actual nod name, now some time we have to deeper in hierarchy as parent along with child might also not be unique and in such case we have to go one more level deep.

 

So code will look something like,

 

 var nodes_dates = xd.Descendants("Dates"); //find all nodes with name Dates
 foreach (XElement n in nodes_dates)
 { 
 n.Name = n.Parent.Parent.Name + "_" + n.Parent.Name + "_" + n.Name; //change name with parent_parent_node (as it is unique and parent_node is not unique)
 }
 var nodes = xd.Descendants("Contacts"); //find all nodes with name contacts
 foreach (XElement n in nodes)
 {
 n.Name = n.Parent.Parent.Parent.Parent.Name + "_" + n.Parent.Parent.Parent.Name + "_" + n.Parent.Parent.Name + "_" + n.Parent.Name + "_" + n.Name; //change name with parent_parent_node (as it is unique and parent_node is not unique)
 }
 xr = xd.CreateReader(); //Update xmlreader
 
 

So here name of node will get changed so that we don’t have any duplicate XML nodes. Only trick is finding which all nodes are duplicate. For that we have to rely on errors, so we have to test code enough number of times to make sure we have handled almost all duplicate nodes.

 

 

-> Enumerating through folder to process all XML files

This is easiest part, it just needs one for each loop

 

 foreach (var file in Directory.EnumerateFiles(@"D:\XML\FAC_SSIS_Identities\", "*.xml"))

 

Then you can use this file variable wherever you want to mention file name along with path.

 File variable will have complete path of XML files.

 

-> Find number of rows in table and decide next identity value

With enumeration comes next problem reseeding identity value.

When you process new XML file identity value gets reseeded to 1 so you have to change that so that it gets right identity value and it does not create any duplicates.

For this you have to go through all tables and find how many rows are already present and then change default value of Dataset to this new values.

 

Code for that will look something like,

 

 foreach (DataRelation dr in ds1.Relations)
 {
 SqlConnection conn = new SqlConnection("Server=dil\\denali;Initial Catalog=Ken_XML_data;Trusted_Connection=True;");
 conn.Open();
 SqlCommand cmd = new SqlCommand("SELECT * FROM sysobjects where name = '" + dr.ParentTable.TableName + "'", conn);
 var val = cmd.ExecuteScalar();
 if (val != null)
 {
 cmd.CommandText = "select max([" + dr.ParentColumns[0].ColumnName + "]) from [" + dr.ParentTable.TableName + "]";
 long max = 0;
 try
 {
 long.TryParse(cmd.ExecuteScalar().ToString(), out max);
 }catch (Exception ex) { max = 0; }
 dr.ParentColumns[0].AutoIncrementSeed = max + 1;
 }
 cmd.CommandText = "SELECT * FROM sysobjects where name = '" + dr.ChildTable.TableName + "'";
 val = cmd.ExecuteScalar();
 if (val != null)
 {
 cmd.CommandText = "select max([" + dr.ChildColumns[0].ColumnName + "]) from [" + dr.ChildTable.TableName + "]";
 long max = 0;
 try
 {
 long.TryParse(cmd.ExecuteScalar().ToString(), out max);
 }catch (Exception ex) { max = 0; }
 dr.ChildColumns[0].AutoIncrementSeed = max + 1;
 }
 conn.Close();
 }
 //constraints will be ignored as it was giving some error
 ds1.EnforceConstraints = false;
 

If you look at code carefully you will see that I’m enumerating through each relation and running query to find max value of parent table and child table and then reseeding table identity column with this max value.

This is as dynamic as possible so it can handle as many relations as we have.

 

So with above changes we should be able to take care of most of the difficulties that we face with complex XML files.

 

With this, this case study ends…

 

I hope this is helpful.

 

Link for solution file:

https://sdrv.ms/17vlQLL

 

It does not have schema and xml files as XML file has some sensitive data and replacing all will take good amount of time.

 

Let me know if you have any question.