Lots of similar errors during processing
Analysis Services processing (on Adventureworks DW cube in my example) can fail with this error
"OLE DB error: OLE DB or ODBC error: Operation canceled; HY008."
In SQL OLEDB terms HY008 means DB_E_CANCELED (aka your query was cancelled purposefully by the caller)
If you get lucky, you can see a better error from Management Studio
Internal error: The operation terminated unsuccessfully.
OLE DB error: OLE DB or ODBC error: Query timeout expired; HYT00.
Errors in the OLAP storage engine: An error occurred while the dimension, with the ID of 'Market Basket ~MC-Order Number', Name of 'Market Basket ~MC-Order Number' was being processed.
HYT00 means DB_E_ABORTLIMITREACHED / 0x80040E31 or a timeout expired, so the timeout expired due to the SQL_QUERY_TIMEOUT setting, meaning the command timeout or query timeout kicked in to kill the running query and cancel the work.
Similar failure seen by Processing from XMLA results messages
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Parallel> <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"> <Object> <DatabaseID>AdventureWorksDW2012Multidimensional-EE</DatabaseID> </Object> <Type>ProcessFull</Type> <WriteBackTableCreation>UseExisting</WriteBackTableCreation> </Process> </Parallel> </Batch>
You get a ton of different errors appended to a long string. The first connection probably got a timeout, but you may not noticed, because all the other connections get a cancellation notification, so Analysis Services reports them in a seemingly random order in a long string. Good luck finding the timeout indicator in this text…
Internal error: The operation terminated unsuccessfully. Internal error: The operation terminated unsuccessfully. Server: The current operation was cancelled because another operation in the transaction failed. Internal error: The operation terminated unsuccessfully. OLE DB error: OLE DB or ODBC error: Communication link failure; 08S01; Shared Memory Provider: No process is on the other end of the pipe.
; 08S01. Errors in the OLAP storage engine: An error occurred while the dimension, with the ID of 'Dim Time', Name of 'Date' was being processed. Errors in the OLAP storage engine: An error occurred while the 'Fiscal Year' attribute of the 'Date' dimension from the 'AdventureWorksDW2012Multidimensional-EE' database was being processed. OLE DB error: OLE DB or ODBC error: Communication link failure; 08S01; Shared Memory Provider: No process is on the other end of the pipe.
08S01 means DB_E_CANNOTCONNECT from the provider, so this is a bit of a misnomer. It could be that it can’t connect, or its been disconnected / cancelled by the provider or the server if the query was cancelled.
Always check the OLAP\Log\Msmdsrv.log file too. You might get the error message in case your application didn't log it.
(6/12/2012 4:52:21 PM) Message: (Source: \\?\C:\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
(6/12/2012 4:52:21 PM) Message: OLE DB error: OLE DB or ODBC error: Operation canceled; HY008. (Source: \\?\C:\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
(6/12/2012 4:52:22 PM) Message: OLE DB error: OLE DB or ODBC error: Operation canceled; HY008. (Source: \\?\C:\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
(6/12/2012 4:52:24 PM) Message: OLE DB error: OLE DB or ODBC error: Operation canceled; HY008. (Source: \\?\C:\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
(6/12/2012 4:45:33 AM) Message: OLE DB error: OLE DB or ODBC error: Operation canceled; HY008. (Source: \\?\C:\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
That indicates that The OLE DB provider reported an error. hex code 0xC1210003.
What does all this mean in context?
When Analysis Services process a cube (or a smaller object like a dimension or measure group), it sends many large sql queries to the relational database engine through an OLEDB provider, such as (SELECT * FROM DimTABLE1, SELECT * FROM FactTable1, etc)
These queries can take minutes to hours, depending on how many joins there are and how big the tables (and partitions) are. It is dependent entirely on your cube design, and your dimension and measure group relationships in the design.
To connect to the relational data source, there are connection strings stored in the cube design to point to the database server.
This is a connection string that gets saved into the AS database design. It can point to SQL Server, or it can point to other 3rd party relational databases (Teradata, Oracle, etc) In the below screenshot I am using the SQL Server 2012 OLE DB provider named SQLNCLI11.1
Whenever a command (a TSQL query) is issues to this data source, the command timeout property is set by the analysis services server.
This is some ADO pseudo code to show how a command timeout is set by the code that runs Analysis Services internally…
conn1.Open(); command = conn1.CreateCommand(); command.CommandText = "Select * from DimTable"; command.CommandTimeout = 15;
So in the above example, if 15 seconds passes and the query hasn’t yet finished, the OLEDB provider will cancel the query on behalf of the caller. The caller doesn’t have to keep any timer because the timeout is set in the provider layer, therefore the caller doesn’t really know if the query fails how long it took and if it was a timeout or not.
In OLEDB terms, this property is called DBPROP_COMMANDTIMEOUT on DBPROPSET_ROWSET object. This property lets you run queries for a certain amount of time, and if the command doesn’t finish it will be cancelled. In SQL Server you can see such timeouts with an Attention event in the profiler trace, and the command’s duration will exactly match the duration of the command timeout.
Note that command timeout setting is not set on the Connection or the connection string itself, so it has to be set later per command. The similar connection timeout is DBPROP_INIT_TIMEOUT on DBPROPSET_DBINIT. In Analysis Services the connection timeout is a separate property ExternalConnectionTimeout. This would be applicable for making initial contact with the server, checking the accounts logging in, and such, but not very applicable for running long queries.
So how do you then set this OLE DB command timeout in the Analysis Services caller?
There is a setting (happens to be hidden behind the advanced options) on the Analysis Services instance that is set to a default of 60 mins = 1 hour. That’s a pretty high timeout, meaning that if any one TSQL query to the relational database lasts 1 hour or more, it will be cancelled by the OLEDB provider used to connect to that system, and the Analysis services command (processing most likely) will fail.
An integer property that defines the timeout, in seconds, for commands issued to external servers, including relational data sources and external Analysis Services servers.
The default value for this property is 3600 (seconds).
If you expect the processing queries to take more than 1 hour, then you might raise the timeout even higher than 1 hours. For example, I was working on a cube this week and the processing join queries take around 9 hours to complete on a 2TB database with some very large complex joins.
Right click on the server name in Management Studio > Properties. Then check “Show Advanced (All) Properties”. Then adjust the ExternalCommandTimeout setting.
Now when it runs external queries to talk to the relational database, it will set the Command Timeout to the value specified so that it can run a long time without failure.
Should my processing queries really run this long?
Probably not! Maybe you should invest time to tune the joins that AS does when it runs all those processing queries in the background on your behalf, or partition your measure groups so that the unit of work done by processing is a smaller chunk of data rather than all data at once.
Partitioning requires a lot of thought and cube design work, so I won’t go into it here, but if you need to read more see this article: http://www.sqlservercentral.com/articles/Analysis+Services+(SSAS)/70282/ They say if you have more than 20 million rows in a table, and you are having AS processing performance problems, then consider partitioning.
After running the processing once or twice on the AS cubes you can look for missing indexes easily like this if you use SQL Server as your relational data warehouse system. Take a few minutes to tune - add some indexes to the relational data warehouse tables to help tune those huge joins. This is some SQL code I borrowed from our support tool PSSDiag which we use to identify the most helpful missing indexes, that works on SQL Server 2005, 2008, R2, and 2012. Find the indexes on the fact and dimension tables that help improve the performance the most.
PRINT 'Missing Indexes: ' PRINT 'The "improvement_measure" column is an indicator of the (estimated) improvement that might ' PRINT 'be seen if the index was created. This is a unitless number, and has meaning only relative ' PRINT 'the same number for other indexes. The measure is a combination of the avg_total_user_cost, ' PRINT 'avg_user_impact, user_seeks, and user_scans columns in sys.dm_db_missing_index_group_stats.' PRINT '' PRINT '-- Missing Indexes --' SELECT CONVERT (varchar, getdate(), 126) AS runtime, mig.index_group_handle, mid.index_handle, CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure, 'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC PRINT '' GO
Sometimes queries are cancelled or fail for other reasons besides timeouts…
We have many calls to support with these error for non-timeout cases too. The most common other cause of a processing TSQL query being cancelled is out-of-memory kinds of concerns.
There can be competition for memory between SQL Server Database Engine (SQLServr.exe), Analysis Services(MsMdsrv.exe), Integration Services packages (DTExec.exe / ISExec.exe), Reporting Services running on the same box. Maybe you need to throttle back the other services. The most common being to lower the SQL Server ‘maximum server memory’.
Remember that processing is the most intensive time for a normal SQL Server, since the Analysis Services throws several large queries with plenty of joins to the SQL relational database engine at the same time.
exec sp_configure 'show advanced',1; reconfigure; exec sp_configure 'min server memory'; exec sp_configure 'max server memory'; -- look at config_value in the results for the current MB setting configured
The ETL processes that typically run (SSIS packages to import large sets of data from a transactional system into a data warehouse system) rarely benefit from the wonderful buffering of the SQL Server database Engine’s buffer pool, because BULK INSERTs simply don’t require much memory. The SELECT and UPDATE and JOIN parts of the ETL processing (such as Lookups and slowly changing dimension updates) during the ETL phase of building a data warehouse certainly could benefit from SQL’s large buffer pool, so lowering the SQL Engine’s memory may have a side effect on those parts of the ETL imports that usually go on just before cube processing. That is, reading data from RAM is 1000-1million times faster than reading from your average spinning disk drive, therefore shrinking the SQL buffer pool means more disk reads, and unless you have high end SSD solid state disks or a high end SAN you may wait a little more.
Another obvious thing worth calling out, if you are on a 32-bit server, one simple answer is to upgrade to a x64 bit processor, so that Analysis Services can address more memory at any one time and make use of more of the RAM in the computer.
In general processing is a complicated thing.
There are many other “dimensions” of processing I can’t fit into the blog post. Pardon the pun.
- For processing best practices, this is a great resource http://technet.microsoft.com/library/Cc966525
- See this whitepaper article for an architecture overview of processing http://msdn.microsoft.com/en-US/library/ms345142(v=SQL.90).aspx
- On the processing command are you specifying to run in Sequential order, or running Parallel tasks? Check the SSIS package or XMLA job that runs the processing.
XMLA – this example show its running up to 8 tasks in parallel<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Parallel MaxParallel="8"> <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"> <Object> <DatabaseID>AdventureWorksDW2012Multidimensional-EE</DatabaseID> </Object> <Type>ProcessFull</Type> <WriteBackTableCreation>UseExisting</WriteBackTableCreation> </Process> </Parallel> </Batch>
Maybe you can scale back the number of parallel tasks (if you manually override “Let the Server decide” default), or run in Sequential mode to see if the errors go away, since it will take less memory to run a sequence of one task at a time rather than many at once. The tradeoff may be that it runs longer because you can’t push the hardware to the same throughput limits.
- Advanced setting AggregationMemoryLimitMax See this one blog post http://geekswithblogs.net/ManicArchitect/archive/2010/11/02/142558.aspx
AS uses memory quota to control the number of concurrent jobs. Each job will calculate how much memory it needs to finish the job and request the memory quota based on its estimate. The job will only proceed when the memory quota is granted. We estimate the quota for aggregation job, the configuration setting that controls the memory usage estimates are <AggregationMemoryLimitMin> and <AggregationMemoryLimitMax>
To achieve more parallelism for processing you could take this advice to tune the settings…
Advanced scenario… what if it is none of the above?
In support sometimes the simple solutions just don’t work. When this happens we result to advanced measurements and eventually debugging.
If memory is the culprit I would gather a profiler trace and these performance counters to better investigate the cause next time when it happens.
A. Set up Windows performance monitor (Start > Run > perfmon) to produce a trace of resource consumption.
B. Right click on Counter Logs icon in the tree under Performance Logs, and begin a new counter log. Name the log.
C. Add the counter for the following Objects, ALL counters for each object, ALL instances for each object.
-MSAS* --- all objects (for default instances of AS)
-MSOLAP$InstanceName* --- all objects (for named instances of AS)
-MSSQL* --- all objects (for SQL Engine)
D. Sample every 15 seconds.
E. On Log tab, specify the directory and file name strategy, as a Binary File.
F. To get Perfmon to rollover to a new file once a day, on the Schedule tab, choose
G. Stop log after "1 day", and when the log file closes "Start a new log file"
Reviewing the performance monitor results
I would look at SQL Server engine’s counter to see if the SQL Memory > Total Server Memory was growing out of control.
I would look at Memory > Available MBytes counter to see how much free memory was available to the processes running in Windows.
I would also look at Process > Private Bytes for the various executable processes to see how much each takes in comparison.
I would look for signs in the MSAS/MSOLAP counters. If the usage amount goes above the High KB amount, then AS would have to trim some of the buffers in memory.
- Memory Usage KB
- Memory Limit High KB
- Memory Limit Low KB
- Memory Limit Hard KB
If the usage KB amount exceeds the Hard KB limit, then Analysis services may cancel all current work and go into “panic mode” to kill off the memory consumers since they are heeding the throttling fast enough. This may manifest itself in similar errors, but usually the error is more descript such as "The Operation Has been Cancelled" or “The session was cancelled because it exceeded a timeout setting (session orphaned timeout or session idle timeout) or it exceeded the session memory limit.”
The advanced debug route to find the code which saw the abort
The abort error text Internal error: The operation terminated unsuccessfully. translates into hex code hresult 0xC1000007
The next most common error The OLE DB provider reported an error. translates into hex code 0xC1210003
I can add these hresults into the minidumperrorlist tag of the msmdsrv.ini file to get a minidump from the time of the failure. Following this KB article http://support.microsoft.com/kb/919711
- Open notepad as administrator (elevated UAC)
- Open the msmdsrv.ini file, which lives in a default folder such as C:\Program Files\Microsoft SQL Server\MSAS11.Instancename\OLAP\Config
- Add a comma and the the hresult of the error to the list of errors in the tag
<MinidumpErrorList>0xC1000000, 0xC1000001, 0xC102003F, 0xC1360054,
0xC1360055, 0xC1000007, 0xC1210003</MinidumpErrorList>
- Next time the error occurs a SQLDmpr0001.mdmp minidump will be produced into the OLAP\Log folder. Internally in Microsoft we review the dump file to look for context clues about why the failure occurred.
- If we need to get a full memory dump to fully see the insides of the Analysis Services process, we can raise two more flags 0x3f4 and 0x4 to get that. Then we can see the query syntax and other processing commands that might be running from the dump alone. Caution: the file can be very big – as big as the Memory (RAM) private bytes consumed by the MsMdSrv.exe process as see in Task manager.
<SQLDumperFlagsOn>0x3f4</SQLDumperFlagsOn> <SQLDumperFlagsOff>0x0</SQLDumperFlagsOff> <MiniDumpFlagsOn>0x4</MiniDumpFlagsOn> <MiniDumpFlagsOff>0x0</MiniDumpFlagsOff>
- I don’t expect you will analyze the dump file by yourself, so most folks would send it into the Microsoft support team (ie. my team) for support. Open a support request here if needed http://support.microsoft.com/select/default.aspx?target=assistance
Are there other timeouts in Analysis Services for long queries?
Query Timeout is another setting on the Data Source
As far as I can tell this setting that seems not to apply readily to processing.
Perhaps this setting applies to the connection pool and will help expire idle connections that are no longer needed, but I don’t think this setting applies to the commands that are run during processing or ROLAP commands.
In that regard, I think our docs might be wrong on this point. http://msdn.microsoft.com/en-us/library/ms189027
Specifies how long a connection request will be attempted before it is dropped.
Type the query timeout in the following format:
This property can be overruled by the DatabaseConnectionPoolTimeoutConnection server property. If the server property is smaller, it will be used instead of Query Timeout.
There are many other timeouts in Analysis Services, such a ForceCommitTimeout (for processing to kill user queries should MDX queries hold locks that block processing commit), CommitTimeout (for processing to give up if it gets blocked at commit phase), ServerTimeout, (for queries to timeout after some time) and the connection pool settings sucha s IdleConnectionTimeout, IdleOrphanSessionTimeout, MaxIdleSessionTimeout, MinIdleSessionTimeout, DatabaseConnectionPoolConnectTimeout, and the ones we discussed ExternalConnectionTimeout and ExternalCommandTimeout.. We’ll go into those other ones later!