PowerPivot Data Refresh

In this posting we will take a look at the data refresh facility in PowerPivot for SharePoint and show you how to use it for maximum benefit. To start off, let’s begin with a simple question: So . . . what is data refresh? And why should I care? Data refresh is all about keeping your data current. It ensures that your users are making decisions based on the most current and up-to-date information possible. This is done with as little investment of your time as possible. We want to make it easy and straightforward to have your data be as current as it can be. Set it up once and it runs until you tell it to stop.

Example: Suppose you have spent several hours, maybe days, on an important workbook. You have gather data from five sources: three corporate IT databases, a local spreadsheet provided by a colleague with market share information, and finally a local table embedded in a worksheet that you have typed in by-hand. The local table drives a market penetration forecast that is the whole point of the workbook. The business problem at hand is this forecast. And you publish the workbook to your local PowerPivot for SharePoint site --- and all of your fellow team members LOVE IT! Excellent work. Your boss walks in the door and says “Nice work. But we’ve just published this month’s data, can you update the numbers?” Aww, Oh. Now what do you do? Updating the data wasn’t in your original plan.

Sure you could copy down the whole workbook and update the data on your local machine and re-publish, but there should be a way for you to automatically request that the data be updated for you. After all, all of the data is on the server. Why not? What about the fact that some of the corporate databases are updated weekly, some are monthly – do you have to manage that process yourself also? Why couldn’t the system update the data and then it is just there when you came into work the next day?

Well, the answer is that the system can – and that is what the PowerPivot data refresh facility is all about.

Before we get too deep into it, let’s talk a bit about what the data refresh facility isn’t. The word ‘refresh’ and ‘data’ is used a lot in SharePoint; all over Excel and all over most IT systems. In our case, we are talking about a specific kind of data refresh. We are talking about the data that is stored inside the PowerPivot workbook. When you have the workbook open, it is the data stored in the PowerPivot Client Window. It is the thousands and thousands (possibly even millions) rows of data that you have imported into the workbook.

clip_image002

It is that data. . . To update that data on the server, the following two restrictions have to hold:

1. The data has to be available to the server. This seems obvious, but sometimes its not. Files located on your local hard drive, or within the workbook itself (such as the local table) cannot be refreshed on the server. The data has to be located somewhere the server can get to it.

2. You have to ask that the data be refreshed. You have to setup a refresh schedule. You have to indicate what data sources to be included. The system does not do the data refresh by itself – well, it does – but only if you request it.

This raises an interesting question. Just because you have a workbook, does it have to be refreshed? Well, the answer is NO. It is up to you and your business problem. It is quite reasonable that you might consider it more important that you (as the expert) review the workbook before the data is published. It might be that assumptions and business rules that applied in the past are not longer valid. And a person would only know that if they saw the report prior to others using it. If this is the case then by all means DO NOT SETUP DATA REFRESH! Only use the facility if it solves your business problem.

However, for many classes of problems, the automatic daily and weekly refreshing of the data is as important to its use. Data can only be acted upon if it is fresh and relevant to the business. Stale data is misleading and could harm the very business processes that are being developed. Only you the expert knows the difference.

How to setup a data refresh schedule

There are two ways of setting up a data refresh schedule. First, you can use the standard document library option menu: the item is called “Manage data refresh”. The other option is to use the Gallery Silverlight control and click on the “Manage data refresh” icon.

clip_image004

To setup the schedule, click on “Enable” and provide the following information:

  • When to run the schedule? Typically you would click on daily or weekly, but there are lots of options here. Many of the options are only displayed when the relevant time period is selected. For example if you click on “Monthly” then the following options appear.

clip_image006

  • When should the refresh schedule run during the day? Normally you should enter “During after business hours” – which the administrator establishes as properties of the PowerPivot service application. If you don’t want to run the schedule then, you can enter in the starting time for when the job will run. This has two interesting side-effects: (1) the time specified is the ‘starting’ time – meaning that given system usage and available resources, the actual job may run later, this is just the earliest that it could run; and (2) that after a successful run, the schedule is automatically disabled. To be totally honest, the second side-effect is done on purpose to discourage users from not using ‘after business hours’. We think that that the ‘after business hours’ approach makes the most effective use of resources.
  • To whom should failure email notifications be sent to? You can enter several email addresses/users if you wish. To be notified when workbooks are refreshed successfully, use the normal SharePoint alerting facility to send emails out when new content on a document library arrives. This is normal SharePoint. This option is for failures.

Now let’s take a look at the Windows credentials to run the schedule under:

clip_image008

  • Who should the user be? There is one Windows user regardless of how many schedules are specified in this job. In this field you specify “Who” the user is that will run at 2am in the morning. It must be a Windows user. You have several options here, the end result of each one is an NT account:
    • “Connect using the credentials already stored inside the workbook” – in this case, the system will use the unattended execution account that is specified for the PowerPivot for SharePoint service application. Typically this is a low permissions account that has no trusted access to the various data sources. In this case, the data refresh facility will use the non-Windows authentication (i.e. sql logins) that have been specified on the connection string in the data sources.
    • “Connect using the following Windows user credentials” – The user can enter in a Windows username and password. We will store these credentials in the SharePoint Secure Store facility for the user. At 2am the data refresh facility will pull the credentials out and do a Windows logon using them – and the data sources can then make trusted connections.
    • “Connect using the credentials saved in Secure Store Service” – if the user has the ability to enter in their own Secure Credentials the data refresh facility will use them at 2am for the logon. If using this approach, the application ID must be a Secure Store group that includes the service account used by the PowerPivot service application (so we can read the credentials).

NOTE: These options are important because they establish the Windows environment for the job. If there are any trusted connections used in the workbook’s data sources, then this is the Windows user that will be logged on for the job; the credentials in the data source are used for non-Windows authentication..

Finally let’s take a look at the data sources contained in the workbook.

clip_image010

  • Lastly you will see a list of data sources. You can select all data sources, or data source by data source within this job. For each data source you specify if you want it included in the data refresh schedule or if you would like to schedule it separately. For a data source you can specify:
    • If you want this data source to be included in the default schedule, or you can specify a custom schedule. All of the same options for the workbook can be ran on each data source, e g. the third Tuesday of the month.
    • For each data source you can specify if you want the data source included in the default schedule with the workbook or separately. If separately you can control which tables are updated and when. The system knows what tables have been imported from what data sources. When you select a specific data source, then only those tables that are built upon it are updated using this schedule. If you have two tables that you would like refreshed at different periods but they use the same data source, then you have to enter the same data source twice so you can schedule the two tables independently.
    • The username and passwords that are specified for the data source can either be (1) from the embedded workbook (i.e. you entered the credentials in the client when the workbook was created), (2) custom username and password that you enter in by-hand (and we will store for you in Secure Store), or (3) they come from a Secure Store Application ID that you have created. In any case, these credentials are for non-Windows authentication. They are not used to establish a trusted connection – that is what the other Windows credentials are for up above in the dialog box.
    • If you do not want the data source to be updated, such as if the data source is contained on your C:\ drive, then uncheck the data source and it will be ignored when the schedule runs.

Once you have established a data refresh schedule then you will find that the “Manage data refresh” option first points to the history page of the workbook. The history page shows you when the data refresh schedule(s) were run and what the outcome of the job was (i.e. success or failure).

clip_image012

Conclusion

In summary, use the data refresh facility to keep your data up-to-date. Just setup your schedule and the system runs in the background automatically. Your data just arrives when asked for. The facility is designed as a simple, straightforward way to have end users schedule their own data refresh requests. The system runs the schedule automatically (typically ‘after business hours’) and then posts the updated workbooks back into SharePoint in the same document library where the workbooks came from.

Finally, we wouldn’t be honest if we said that the data refresh facility was the right solution for everyone. Clearly it isn’t. It is oriented to you, the end user; not to your IT colleagues. As a scheduler, it lacks many of characteristics that IT systems typically have. For example, there is no real-time or on-demand access to data; or the ability to issue frequent updates, such as hourly or every 5-10 minutes. These capabilities are important in their own way, and some systems demand them, but in the managed self-service world they come with a management overhead and complexity that is just not appropriate for our users.