Today I needed to do some performance and tuning on a code base we have been working on for the past few weeks. This includes a set of extensions to SharePoint 2010. I wanted to see what the performance impact of our feature is on a default SharePoint publishing site.
For this I use the VS Web Performance tests and the Load test tools.
First of all I Started wit a clean Site Collection with one top level Site that is a Publishing site. Next I Created a new Visual Studio 2010 Test project and added to start with a Web Performance test.
Now your browser will start and you can enter the URL you want to test. For me that was http://localhost/pages/default.aspx.
After browsing to this page I just hit the stop recording button and this gave me a starting point, for my test scenario. I must note, the feature we build is activated at each page request, and I want to know what the impact is on a single page request. So I created a scenario that includes on page in the site or one of the sub sites I created.
Now that I have a web test, I can also add a Load Test. This test takes other test types you have in your project as a driver of the load. So for this scenario I only added the single web test I just recorded. Now I could create a baseline load test, to see how all performance on my simple laptop machine. So I set the load to be just one user, run it for a period of time of 5 minutes and since the feature we build is not influenced by any browser type that comes along, I just kept the browser mix to one browser.
Once you have ran this test, you can see the average request/response times and get some information on the load this generates on your machine. You must be careful when interpreting these numbers when you run everything on one box, as I did in this particular case, but it gave me enough information. I validated these results, so they did not contain any test errors, and I did a second run to verify if the test results I got were consistent.
After that being the case I deployed the SharePoint solution to the farm and verified by hand if the feature did what it was supposed to do. Next I ran the performance test again and this gave me the results I was looking for. Our feature had an performance impact of a magnitude Without actually using the feature, but just by activating it I bumped response times from 0.3 seconds on average to 1.3 seconds per request. Now that was clearly not our intention, I would have settled for 10% performance degradation, but this was unacceptable.
So what next?
The great thing about the Visual Studio toolset is that you can now just use the performance profiling tools with the already created load test in place. So what I did is just fire up a second instance of Visual Studio and from the analyze menu pick the Profiler menu and then New Performance session…
To get some first results I just used the sampling method and I could just attach to a running instance of the website. So I started the Load Test again for a 5 minute run and took the task manager to see which W3P process is eating up all CPU cycles. After finding that process ID I attached the profiler on W3P and waited for about one minute. Next you stop the profiler and it will start generating a report for you. The first report I got gave the following results:
Unfortunately the profiler did not find any code written by me that could have caused an issue, but I was pretty certain this should be the case. So I clicked Show All Code and the you get a nice summary result and a hot path for the code that might cause the problem. this looked as follows:
So interestingly enough the issues are caused by some CLRStubUnknownAddress. Because I was certain this is my code, I seems like the profiler just could not resolve the entry points in the code to my codebase. During analysis of the report you also get Warnings and errors in the task list of Visual studio. One of these errors in that list was the following:
DA0002: It appears that the file was collected without properly setting the environment variables with VSPerfCLREnv.cmd. Symbols for managed binaries may not resolve.
So What I did is try to resolve this and started an Visual Studio Prompt in elevated mode (it needs to modify the registry, so it needs admin privileges). Next I started the following command: vsperfclrenv /globalsampleon
Next I needed to reset the IIS server, so each new worker process we attach to has these flags enabled in its process.
I ran the load test again, attached the profiler again, waited for about one minute and voila, here I had my report showing me where our code misbehaves!
Next you can dive into these functions that are marked with a flame symbol and see what is going on in that part of the code. I Added two arrows to show you that you can drill down in the code that is causing the problems and you can see that accessing the Property PermanentRedirects is causing the problem here and not the TryGetValue()
After diving some deeper I found that we constantly hit a code block that calls the SharePoint function: SPSecurity.RunWithElevatedPrivileges()
The delegate we hit there is constantly executed, while this should have been cached, we only need to get that data once. All in all something we can fix easily, but with these tools I was able to find it in relative short amount of time. I spend most of the time on figuring out why I did not get the detailed results and finding the command line to make it work. All In all I spend about 2 hours investigating.
So I hope with this post, you can see how easy it can bee to do some basic performance analyses on a real world environment like SharePoint in stead of the demo’s you get at conferences showing you how to do a Mandelbrot application