Recovering database with memory-optimized tables involves hydrating the contents of checkpoint files (data/delta files) into memory and then replaying the tail of the log (see this link for more details). One of the important difference between disk based tables and memory-optimized store is frequency of checkpoints. Automatic checkpointing for in-memory tables occurs every 1.5GB of log records unlike traditional or indirect checkpoints (where checkpointing is done more often) leading to longer tail of log for in-memory tables. The 1.5 GB log flush is chosen to strike the right balance between flooding the IO subsystem with too many small inefficient IO operations and too few large IOPs. In most scenarios observed by our CSS teams, long recovery times for memory optimized databases is caused by the long tail of log which needs to be recovered for in-memory tables in the database. For these scenarios, running a manual checkpoint before a restart can reduce recovery times as manual checkpoint forces the checkpoint for memory optimized tables in addition to disk based tables.
Another important consideration to keep in mind during recovery of memory optimized tables is the table size in memory is larger than size on disk as only the data rows are persisted and not the indexes. During recovery, all the indexes of the tables are reconstructed which would consume more memory. Hence if you estimate the memory required to restore and recover a memory optimized database with size of the memory optimized filegroup on disk you may see OOM error. Let's say the size of memory optimized filegroup on disk is 300GB (with no traditional SQL data in the database) and if the max server memory is set to 325GB while you may expect the database to recover successfully but instead you might see OOM error. With an OOM error, your database will come up as suspect. To resolve this, we need to increase the memory available.
Here's a memory sizing calculation to keep in mind while planning and sizing the restore and recovery of memory optimized database.
The first thing to note is that in case of to see how much memory is occupied by XTP, use the DMV sys. dm_db_xtp_memory_consumers on an online database:
SELECT ISNULL(SUM(allocated_bytes)/1024/1024, 0) as XTP_MEMORY_IN_MB
This shows the memory consumed by tables and for other system operations.
In the above case, memory consumed may be around 340GB – which is already greater than 325GB and hence leading to OOM error.
Secondly, all of the max server memory is not available to be used by memory optimized tables. The percentage of memory depends on MAX SERVER MEMORY. See this link to find out what share your memory optimized tables would get. Here's a snippet from the docs:
In the example above, since the target committed memory is >96GB, the memory available is 90% of 325GB which is 292.5GB.
Lastly, it is recommended to use dedicated resource pools for databases with memory optimized tables. This is because SQL memory manager behaves differently for a default resource pool (which is used by most memory consumers and therefore is more generic in response to memory pressure) compared to a dedicated resource pool which the user has more control over. To bind a resource pool to a database, see this link.
In the example scenario if you bind the database to a resource pool with a MAX_PERCENT of 95%, we would get 95% of 90% of max server memory. To recover a 340GB database successfully, we would need ((340/0.95)/0.9) which is approximately 400GB of max server memory. Anything lesser would result the database recovering into a suspect mode.
For more detailed information on memory requirements and how to deal with memory issues, please refer to the BOL link here.
SQL Server Tiger team