Back in august I worked with a customer to build a workflow solution using Workflow 3.5. We crafted up a simple workflow where we would like to get a message into the workflow using the receive activity and then persist this call in the database, giving the caller a ticket that he can use later for reference. We could do some processing in the background and then when we finish the caller could get information based on the ticket. Now this is in my opinion a very common scenario that you probably will run into with your solutions as well.
to make this scenario more reliable, we used a transaction scope activity to guard the database transaction and make sure the workflow get's persisted in the same transaction.
A simplified version of the workflow we used is shown below.
Now this looks to work perfectly correct, but we discovered that if the transaction fails you will get the following error:
"Workflow service unexpectedly unloaded from memory while executing a ReceiveActivity. Make sure that the the workflow does not contain any blocking activities within a ReceiveActivity."
After getting the error back the workflow will be terminated since it got an unexpected exception.
Things can even become worse if you look from the messaging perspective, that the caller will recieve an timeout on the call, since no reply is send back. It appeared that this happened when you set the UnloadOnIdle to false.
To make sure I did not do anything wrong I called a friend who is also very knowledgeable about workflow and asked him what his thoughts where. He also made a blog post about this issue and spend quite some time to figure out what was going wrong here. (http://msmvps.com/blogs/theproblemsolver/archive/2008/08/06/using-a-transactionscopeactivity-with-a-wcf-receiveactivity.aspx)
Finally I decided that this looked like a bug in the activities, so I issues a PSS call to ask for a hot fix for this problem. After working with the support department and some developers on a simple repro of the problem we finally got to the point that it was recognized as a bug and that they could build a hot fix to solve the problem.
So back in November I received a private hot fix that solved the problem, but I wanted to make sure everybody else could leverage the fix as well, since this is such an common scenario.
I crafted up some tests to show the problem and I could use that easily to verify if the hot fix solved the problem or not. You can see below that after applying the hot fix all tests showed green again 🙂
Last week I got the official hot fix and the analysis back from the development team that states the following:
Root cause of the problem:
WorkflowRequestContext, a member of ReceiveActivity, is not serialized as part of the checkpoint. Consequently, when we restore the checkpoint upon TSA fault, we are unable to send a reply message to the ReceiveActivity caller.
We produced a fix to address this issue and delivered to our customer. This fix will be part of the next major releases of WF 3.5 (if any). KB Article 959362 will be published soon
So if you have this same scenario in your solution, check what happens when your transaction fails, e.g. by stopping the database so the call will fail, and see if you run into the same issue. The problem I found with this bug, is that this only surfaces when you have good tests in place that also test for the failure of the transaction and that is probably not always the case. You might not even have noticed you have a problem, but at least you know you can get a hot fix now. You can refer to KB 959362 and they can hand you the hot fix for your environment.