Important Information: In a live database with active users connected, changing an object multiple times or compiling all objects can cause data loss in NAV 2013 R2


You may experience data loss in Microsoft Dynamics NAV 2013 R2 in the following situations, separately or in combination:

  • Changing an application object more than once, for example by two different developers, in the same database connected to the same Microsoft Dynamics NAV Server instance while users are working in the system.
  • Compiling all application objects, and thereby potentially changing objects more than once, in a database that is connected to a Microsoft Dynamics NAV Server instance that users are accessing.

To avoid the problem, we advise that you work according to the following best practices:

  • Application developers must be working on their own database and connect to their own Microsoft Dynamics NAV Server instance. When you deploy changes to the live production database, make sure that no users are working in the system.
  • You must compile objects only when no users are working in the system, including users connecting through NAS. 

With update rollup 5 for Microsoft Dynamics NAV 2013 R2 – KB 2937999, this issue has been fixed and you do not have to take the precautions described above. However, we still advise that you separate development from production databases.

Please note that implementing update rollup 5 will require a database conversion.

 

 

Comments (38)

  1. perftech@sbb.co.yu says:

    This information comes too late. We run four projects on NAV2013R2 this year. We lost data on two of them on PRODUCTION databases in first days after GoLive. Customers lose trust about the NAV. This was the first time in 10 years I experienced this.

  2. glathe@gmx.de says:

    It feels like its getting worse and worse. Really. Have you considered to take measures to mitigate this unhealthy mixture of bugs, bad design decisions, and lacking eperience? From the RTC on (NAV2009! Released in 2008! 5 years ago!) it is a real risk to sell this product to customers. It is dangerous to your reputation and definitely a business risk. The risk is ranging from performance issues, over business logic errors, to half-baked design decisions leaving you with unusable functionality, and demands by your customer to fix this mess – which could easily bankrupt you. There is a reason that RTC-based product isn't selling well. Add to this really bad fuckups like this one…

    Until NAV2009R2 we had at least a Classic Client where the quirks were known (and weren't so catastrophic). You could create good, workable, easy to maintain solutions with this. And you could fence in the dangerous parts of the product so that you had a pretty stable production system. Now, everything appears to be shot. Talk about Scrum, KanBan, Rapid Release Cycle… doesn't appear to improve things. So what are you doing to clean up this mess?

  3. Natalie K. says:

    Can we be really sure this severe bug does not apply to earlier RTC versions as well? Did you test that?

  4. Daniele Rebussi says:

    In case of projects with web services activated, need to be stopped in case of compiling?

  5. Martin Honoré says:

    Hello

    What happends if you import at new object set?

    Is it the same problem?

  6. Martin Nielander says:

    @Daniele Rebussi: Web Services are also considered users.

    @Natalie K: Yes we can be sure. This only applies to NAV 2013 R2 and only if you are doing development in a live production environment.

    @Aleksandar Totovic: I am sorry you had to experience this. We are doing all we can to fix this issue as quickly as possible.

    @Martin Honoré: If you need to compile the imported objects in a live production environment with users working, then there is a risk that you will experience the problem.

    Best regards Martin

  7. Capone says:

    Lets say it I'm the only developer and I only work with document reports. Will there be an issue then?

  8. Martin Nielander says:

    @Capone: If you are the only user connected to the database, then there will be no issue.

    Best regards Martin

  9. anfinnur@formula.fo says:

    Quote: "With the next update rollup for Microsoft Dynamics NAV 2013 R2, this issue will be fixed and you do not have to take the precautions described above"

    Sure?

    Quite sure?

  10. Robert de Bath says:

    .. and only if you are doing development in a live production environment.

    WRONG.

    As you say this also applies if you compile objects in a live environment.

    This happens if the build number you do your development on isn't EXACTLY the same version as the version in live.

    So we have now have to make sure the objects are EXPORTED from the customer test, DONT use the same fob to import into live.

    Because we aren't activly developing in that database we haven't lost any data yet; but it's been close!

    Overheard with a deaf ear … "WTH is this? Did they do any testing?"

  11. Chris "WIC" says:

    what can I say? quite the same: This information comes too late. We run 2 projects on NAV2013R2 actually. We lost data on two of them on PRODUCTION databases in first days after GoLive. Customers lose trust about the NAV. This was the first time in 14 years I experienced this.

  12. Martin Nielander says:

    @Robert de Bath: Quote "This happens if the build number you do your development on isn't EXACTLY the same version as the version in live. So we have now have to make sure the objects are EXPORTED from the customer test, DONT use the same fob to import into live".

    We have not identified any issues in relation to .fob import as described here. If you know something that we don't then please contact MS Support ASAP.

  13. glathe@gmx.de says:

    @Martin Nielander: I would assume that there is an issue with different build numbers dev <> live. The NAV dev. environment automaticlly triggers a recompile in this case on import, doesn't it? When you do this on the live system, it would come near the scenario where you're saying it can cause data loss.

    with best regards

    Jens

  14. Andreas Hargassner says:

    What is the coverage of data loss?

    Are there whole records lost, or only some data in fields?

    Are there inserted records affected or/and modified records?

    How many data can be lost? Since last compile, since last login, … ?

    We will be glad to get some feeling of our risk at GoLive in Production.

  15. Thomas Hejlsberg (Microsoft) says:

    Folks,

    Let me see if I can clarify this a bit.

    Until we have the fix out, the safe way to isolate you from any risk, is to avoid changes to a live system.  If you need to do a change, make sure all users (or to be precise: all sessions) are logged out – this includes any type of session, being RTC, web-client, web-service, NAS-sessions or other developers.

    I would like to share a more detailed picture of what was happening.

    The bug is an unhandled race condition combined with an update of one or more metadata records (which C/SIDE updates by deleting the metadata record followed by an insert of the new version).

    If all the following is true:

    1) You do a change in C/SIDE, which in turn will delete a row in metadata table 2000000071 followed by an insert of the new row in 2000000071

    2) The change of table 2000000071 is detected by the server which flags all tenants as “need to be checked at next access”

    3) An active user request some data from the tenant tables, which triggers a sync operation

    4) The sync operation now reads all rows from table 2000000071

    The worst case scenario is when the read in step (4) just happens to hit the point exactly between the delete and insert in step (1) of either this change or another change we might encounter a situation where we falsely detect that a table has been deleted and hence we will drop the actual data table.  A few milliseconds later we will discover that the table has been “recreated” and we will re-create the data table but of course as an empty table.

    I would say that the chance of hitting this split second is very slim.  Actually it took quite an effort to write test cases provoking  this situation (could even occur.

    In other less severe conditions the metadata might be updated while a sync operation of a previous change is taking place which might leave the database “unmountable” but in this case all data is intact when the situation has been cleared up.

    The majority of reported issues has been related to “index out of sync” situations, which could be a result of the above bug but also result of earlier situations or even manually created indexes. The problem is if a field is included in an index not known to NAV.  If this field is attempted deleted, the operations will fail since the field is used in an index.

    The hotfix we are just about to release, will fix the race condition scenario described above and recreate missing/wrong indexes.  It also allows certain inconsistencies to exist during a sync operation (e.g. we are just about to delete an index, and we discover it is not there).

    Thomas Hejlsberg

    CTO – Architect

    Microsoft Dynamics NAV

  16. Thomas Hejlsberg (Microsoft) says:

    I'm happy to announce that the fix for this bug has now been released to the Microsoft Support Team.  Through the usual channels you can request the fix.

    We plan to release the fix to partner source tomorrow.

    Thanks,

    Thomas

  17. glathe@gmx.de says:

    Now that's good news.

    @Thomas Hejlsberg: Thank you for sharing the details. According to this, importing with the wrong build number would be enough to run into this problem. Better to know before. What makes me a little nervous is what you write about the hotfix: I assume that this doesn't fix the whole problem, right? It sounds like it doesn't.

    with best regards

    Jens Glathe

  18. thomas.hejlsberg@live.dk says:

    @Jens: Yes, Ddoing an import on a live system is one of the scenarios that the hotfix addresses.

    The hotfix should fix the problem…  I'm not sure what you are getting at?

    Thomas

  19. glathe@gmx.de says:

    @Thomas: My impression was that it fixes the race condition, and it is more tolerant on the indexing problem… still allowing for rapid successive changes of the object metadata while the NST is still syncing the database structure / other tenants. Shouldn't storing an object be deferred until the database / all mounted tenants are synced for this object?

    with best regards

    Jens

  20. Klaus Fander says:

    We have just applied the fix but now RTC does not start any more after importing an object. Hopefully support can fix this (request is created) and perhaps it was our fault (I hope!!). But I would like to underline what Chris (WIC) said: I love RTC and I find it a beautiful product with a lot of opportunities. But in history NAV was always a reliable system. Now after 18 years working with NAV I am loosing trust in the EXE: what a pity! And this only because Micrsoft added multi-tenant functionality (for whatever reason!). And if you add functionality and there is a "delete option" in it: why did you not test that ? This is really catastrophic for the trust of our customers in NAV, in us and in Microsoft!! Please please do it better next time!

  21. Phil Finch says:

    Please advise – what is the fix ID?

  22. Phil Finch says:

    Answered my own question.

    ID is 2934571, Build no. is 7.1.36281

    Title:

    Data loss can occur when you make changes to an object multiple

    times or compile all objects in a live database with active users

    connected.

    mbs.microsoft.com/…/KBDisplay.aspx

  23. guido robben says:

    After applying hotfix, the ID 356335 is still happening!

  24. Guido Robben says:

    Part2

    "The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state" error when you import or compile a table.

  25. Thomas Hansen (Microsoft) says:

    @Guido Robben 17 Mar 2014 3:18 AM "Part2"

    The error "The communication object, System.ServiceModel.Channels.ServiceChannel…" is not related to the this fix. We are aware of this error and do have a fix on its way (bug 358256/ kb 2934572 ). The problem is when the Classic Client is not running on the NST (Service Tier) box and the network is not supporting the use of SPN, the connection can't be established correctly.

    Current workaround is to run the Classic Client together with the NST (Service Tier).

  26. Gunther Gebauer says:

    I have the same problem "The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state".

    DB is running on local server as a CRONUS NAV2013R2 release DB. I want to import finished table objects with new fields and it will not work.

    Service is running on the same machine as development environment and build 36281 KB2934571.

  27. Gunther Gebauer says:

    OK – so it will work:

    In the Classic Client in Options I have added Server name, Server Instance Name, Server Port and Management Port (which were not defaults – but 7056 instead) and now it worked to import the objects.

  28. acls@live.dk says:

    Just tried to deploy KB 2934571 at a customer.

    Got this error when logging on to RTC:

    You cannot sign in due to a technical issue

    Event log:

    Server instance: XXXXXXX

    Type: System.InvalidOperationException

    Message: ExecuteNonQuery: CommandText property has not been initialized

    StackTrace:

        at System.Data.SqlClient.SqlCommand.ValidateCommand(String method, Boolean async)

        at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)

        at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()

        at Microsoft.Dynamics.Nav.Runtime.NavSqlConnection.ExecuteFunction[T](Func`1 function, NavSqlCommand command)

        at Microsoft.Dynamics.Nav.Runtime.NavSqlConnection.ExecuteFunctionWithTrace[T](EventTask task, Func`1 function, NavSqlCommand command)

        at Microsoft.Dynamics.Nav.Runtime.NavSqlCommand.ExecuteNonQueryImp()

        at Microsoft.Dynamics.Nav.Runtime.ReindexTablesWithIndexMismatch.ReindexSingleTable(NavSqlConnectionScope tenantDatabaseScope, NCLMetaTable table, IEnumerable`1 companyTokens)

        at Microsoft.Dynamics.Nav.Runtime.ReindexTablesWithIndexMismatch.ReindexAllTablesWithIndexMismatch(NavSqlConnectionScope tenantDatabaseScope, IEnumerable`1 companyTokens)

        at Microsoft.Dynamics.Nav.Runtime.NavSqlDatabaseSync.SynchronizeTenantDatabase(NavDatabase tenantDatabase, Boolean enableLockTimeout)

        at Microsoft.Dynamics.Nav.Runtime.NavDatabase.EnsureDatabaseInSync(Boolean enableLockTimeout)

        at Microsoft.Dynamics.Nav.Runtime.NavUser.GetAllUsers(NavDatabase database)

        at Microsoft.Dynamics.Nav.Runtime.NavUserCache.RefreshList()

        at Microsoft.Dynamics.Nav.Runtime.NavUserCache.GetUser(Func`2 match)

        at Microsoft.Dynamics.Nav.Runtime.NavUserAuthentication.InternalAuthenticate()

        at Microsoft.Dynamics.Nav.Service.NavUserPasswordValidator.Validate(NavTenant tenant, UserNameSecurityToken securityToken)

        at Microsoft.Dynamics.Nav.Service.NavCustomValidator.ValidateCore(String userName, String password)

        at Microsoft.Dynamics.Nav.Service.NavCustomValidator.Validate(String userName, String password)

    Source: System.Data

    HResult: -2146233079

    The customer uses NAVUserPassword authentication

    We had to role the customer back. Any ideas what causes this error?

  29. Thomas Hansen (Microsoft) says:

    @Anders C. Lund

    This is indeed a bug, we are in the process of fixing this and hope to have a fix ready very soon. The problem can be that the indexes on the SQL server is out of sync with the keys in NAV. NAV should detect this, but in some cases it does not. One case we have detected is when you only have the clustered index in table. A workaround is to manually to add a index to the table from SQL management studio. The trick is to find the table with only have a clustered index.  

  30. acls@live.dk says:

    @Thomas Hansen, thanks, do you know if it is fixed in rollup 5 (KB2937999)?

  31. Marc Breuer says:

    Rollup 5 is the same RTC build as the hotfix. So this was not fixed yet.

    We are having exactly the same problem, so please provide some new hotfix for this sql index issue.

  32. jtorres says:

    I recommend that you deploy kb2934572 (build 36310), as it addresses more issues than the original fix provided in build 36281. Also, since the snapshot version has changed, I would also recommend that you sync the snapshot (Sync-NAVTenant) before you continue the development (but after upgrading the database using the development environment build 36281 or higher).

  33. Marc Breuer says:

    The unsynched SQL index error is not fixed with build 36310 (KB2934572)!

  34. jtorres says:

    @Marc – Can you please contact support to address your specific issue? What is the error reported on the event log?

  35. Marc Breuer says:

    @Jorge: the event log of the server shows the following error:

    Type: System.Data.SqlTypes.SqlNullValueException

    Message: Der Wert eines Null-SQL-Datensatzes kann nicht zurückgegeben werden.

    StackTrace:

        bei System.Data.SqlClient.SqlBuffer.get_String()

        bei Microsoft.Dynamics.Nav.Runtime.ReindexTablesWithIndexMismatch.ReindexSingleTable(NavSqlConnectionScope tenantDatabaseScope, NCLMetaTable table, IEnumerable`1 companyTokens)

        bei Microsoft.Dynamics.Nav.Runtime.ReindexTablesWithIndexMismatch.ReindexAllTablesWithIndexMismatch(NavSqlConnectionScope tenantDatabaseScope, IEnumerable`1 companyTokens)

        bei Microsoft.Dynamics.Nav.Runtime.NavSqlDatabaseSync.SynchronizeTenantDatabase(NavDatabase tenantDatabase, Boolean enableLockTimeout)

        bei Microsoft.Dynamics.Nav.Runtime.NavDatabase.EnsureDatabaseInSync(Boolean enableLockTimeout)

        bei Microsoft.Dynamics.Nav.Runtime.NavUser.GetAllUsers(NavDatabase database)

        bei Microsoft.Dynamics.Nav.Runtime.NavUserCache.RefreshList()

        bei Microsoft.Dynamics.Nav.Runtime.NavUserCache.TryGetNavUser(Func`2 match, NavUser& user)

        bei Microsoft.Dynamics.Nav.Runtime.NavUserAuthentication.InternalAuthenticate()

        bei Microsoft.Dynamics.Nav.Runtime.NavUserAuthentication..ctor(NavTenant tenant, NavClientCredentialType credentialType, Object token)

        bei Microsoft.Dynamics.Nav.Service.NSServiceBase.ValidateAndCreateSession(ConnectionRequest connectionRequest, Boolean requireNavUser)

        bei Microsoft.Dynamics.Nav.Service.ServiceOperationInvoker.CreateNewSessionCombinator(ServiceOperation innerOperation, NSServiceBase serviceInstance, Boolean requireNavUser, Object[] inputs, Object[]& outputs)

    Source: System.Data

    HResult: -2146232015

    The client logs this:

    Type: Microsoft.Dynamics.Nav.Client.NavClientClosingException

    Fatal: False

    ShowError: True

    Message: Die Anforderung konnte nicht vom Server "NAV2013R2_Henschke" verarbeitet werden. Die Anwendung wird beendet.

    StackTrace:

        bei Microsoft.Dynamics.Nav.Client.WinClient.ExceptionHandler.DoExecute(Func`1 execute)

        bei Microsoft.Dynamics.Nav.Client.WinClient.StartWinFormsClient.RunCore()

    Source: Microsoft.Dynamics.Nav.Client.WinClient

    HResult: -2146233088

  36. acls@live.dk says:

    @Thomas Hansen – any news on this issue?

  37. msjunk9@hotmail.com says:

    @Martin Nielander

    Sorry all I had was broken databases, development databases have had entire tables of data go missing. Usually with a message beforehand that a column has data in it when it obviously doesn't and the user _has_to_ take the 'don't corrupt my data' tick off.

    It sometimes kinda looks like the object designer is looking at the wrong service tier or database but that would be silly.

    We don't do direct changes in live databases and I have never seen a table get deleted in live. But I have repeatedly seen renamed or added columns not correctly updated.

    It appears to be related to having multiple service tiers and users running. Just logging out users is not enough to stop it breaking. Mitigations have now got to the level of having a service tier just for updates (normally not running) and making sure all other service tiers are stopped before the update service tier is started and objects can be loaded.

    Changing the collation on a database from the very unfriendly "_CS_AS" in the default Cronus is a nightmare. The database has to be put into single user mode. The with service tier stopped. The change to the object table is completed then the database has to be taken out of single user mode to actually change the table layouts … then it doesn't change the collation at the database level so that has to be done manually … sigh.

    The very fact that this corruption can happen at all means that the database transactions have been seriously misapplied with COMMITs being forced to SQL at times where the state of the database is known to be inconsistent.

    IMO: Several people have made some very serious mistakes with this update and it should never have got past testing.

    Robert de Bath (What's up with "live.com" recently!)

    PS: MS Support: Given up on them, they don't listen, I never want "support", I don't want a dumb workaround (I probably have a better one) and I'm not interested in paying them to say they won't fix a problem. In the event that I can (eventually) convince them of all this it still leaves a bad taste that I have to pay to talk to them even though they (usually) give it back.

  38. Matthias Brahm says:

    After applying Build 7.1.36610 we get this error when starting the service tier.

    Build 7.1.36366 has the same Problem when starting the service tier

    Build 7.1.35822 works fine and has no Problem starting the service Tier. So it seems not to be a Problem of a config file but a Problem of the build.

    Has anyone seen this Problem?

    Server instance: test2

    User:

    Type: System.Xml.XmlException

    LineNumber: 0

    LinePosition: 0

    Message: Das Stammelement ist nicht vorhanden.

    StackTrace:

    bei System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res)

    bei System.Xml.XmlTextReaderImpl.ParseDocumentContent()

    bei System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)

    bei System.Xml.XmlDocument.Load(XmlReader reader)

    bei Microsoft.Dynamics.Nav.Runtime.NavSqlMetadata.GetTableMetadataFromTenant(NavDatabase tenantDatabase, Int32 tableId, NCLMetaTable& table, Boolean databaseConvertedFromNav2013, Boolean applyLock)

    bei Microsoft.Dynamics.Nav.Runtime.ReindexTablesWithIndexMismatch.ReindexAllTablesWithIndexMismatch(NavSqlConnectionScope tenantDatabaseScope, IEnumerable`1 companyTokens)

    bei Microsoft.Dynamics.Nav.Runtime.NavSqlDatabaseSync.SynchronizeTenantDatabase(NavDatabase tenantDatabase, Boolean enableLockTimeout)

    bei Microsoft.Dynamics.Nav.Runtime.NavDatabase.EnsureDatabaseInSync(Boolean enableLockTimeout)

    bei Microsoft.Dynamics.Nav.Runtime.NavUser.GetAllUsers(NavDatabase database)

    bei Microsoft.Dynamics.Nav.Runtime.NavUserCache.RefreshList()

    bei Microsoft.Dynamics.Nav.Runtime.NavUserCache.TryGetNavUser(Func`2 match, NavUser& user)

    bei Microsoft.Dynamics.Nav.Runtime.NavUserAuthentication.InternalAuthenticate()

    bei Microsoft.Dynamics.Nav.Runtime.NavSession..ctor(NavTenant tenant, TimeZoneInfo clientTimeZone)

    bei Microsoft.Dynamics.Nav.Runtime.NavApplicationServer.<RunOnceAsync>d__b.MoveNext()

    — Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde —

    bei System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

    bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

    bei Microsoft.Dynamics.Nav.Runtime.NavApplicationServer.<StartAsync>d__7.MoveNext()

    Source: System.Xml

    HResult: -2146232000