If I had to pick my favorite feature in Katmai, it would have to be the Resource Governor. This is something I had wanted for a long time for various reasons and was sorely disappointed when it was cut from Yukon. I’m sure Richard & Euan remember the amount of grief I gave them though it wasn’t really them who made the final cut decision (but they did own the feature back then).

Prior to SQL Server 2008 Resource Governor, there really was very little you could do to manage resources within an instance. The Windows Server Resource Manager controls Windows processes so all you’re doing is restricting what the sqlservr.exe process could consume. Max/min memory settings locked things down at the instance level and so did CPU and I/O affinity. There really wasn’t much that you could do to specific users or groups. Sure, there was the query governor but all you can really do is set an “execution time threshold” for queries. I.E. If the estimated cost of a query exceeds the threshold, it won’t be allowed to run. Not bad for preventing very heavy queries from running but not really controlling resources either and the side effect is users might start opening support tickets thinking something’s wrong with the system because they can’t run their reports. Also, if your statistics are out of date and not automatically updated, queries may be incorrectly allowed or disallowed to run.

With Resource Governor, Admins can now set controls on how resources are consumed in SQL Server. At this time, the RG can control CPU and memory. Disk I/O and network I/O are not covered in this release. Still, this is already a giant step forward.

Let’s try a few things. First, the old way using the Query Governor cost limit. You can set this at the server instance (sp_configure) or on a per connection basis (set query_governor_cost_limit). If your goal is really to prevent long running queries during peak hours, you probably set this at the instance level. After all, how many users do you know will voluntarily not run their heavy queries? ;-) If you’re wondering what self-respecting DBA would allow reporting, DSS and other BI type queries run against a production OLTP system during business hours, well, DBAs don’t write the checks or run the business. It is reality today that the business users want/need to analyze data rapidly, not after a day, week or month. While you can create separate reporting servers, there are some genuine needs where a reporting server’s latency is beyond what the business users require. It’s the new world my fellow traditional DBAs - embrace and control it!

Now, let’s try this example. The following script (written by my colleague Suresh) scans the order tables in the Sales schema in the Adventureworks database and offers a discount for purchase quantities greater than 3. A cursor is used to scroll through and evaluate each row, taking action where appropriate. This query takes anywhere from 20-25 minutes to run on a workstation consuming a notable amount of CPU and memory.

USE AdventureWorks;
GO

DECLARE @l_SalesOrderID INT
SET NOCOUNT ON
DECLARE update_cursor SCROLL CURSOR
FOR
 SELECT SalesOrderID FROM Sales.SalesOrderHeader

OPEN update_cursor

FETCH NEXT FROM update_cursor INTO @l_SalesOrderID

WHILE (@@fetch_status = 0)
 BEGIN
  UPDATE Sales.SalesOrderDetail
  SET UnitPriceDiscount = 0.01
  WHERE SalesOrderID = @l_SalesOrderID
  AND OrderQty >= 3
  FETCH NEXT FROM update_cursor INTO @l_SalesOrderID
 END

CLOSE update_cursor
DEALLOCATE update_cursor
SET NOCOUNT OFF
GO

SQL Server 2005 (and earlier)
With SQL Server 2005 and earlier, you would do one of the following:

Instance wide: sp_configure ‘query governor cost limit’, 300

Per connection: SET QUERY_GOVERNOR_COST_LIMIT 300

Either of these will evaluate queries and disallow execution if they exceed 300 seconds. If you tried to execute the sample query from above, you will get a message that looks like:

Error 8649, Severity 17, State 1, Procedure iduSalesOrderDetail, Line 33
The query has been canceled because the estimated cost of this query (1157) exceeds the configured threshold of 300. Contact the system administrator.
Msg 3609, Level 16, State 1, Line 14
The transaction ended in the trigger. The batch has been aborted.

So that accomplishes your goal of not letting heavy queries execute. These settings can be changed at any time so you can setup SQL Agent to set the appropriate value at different times of the day. However, the big downside here is that your “heavy” queries do not get executed at all. This probably won’t go down very well with your end users. How about letting the query run but not allowing it to dominate server resources?

With SQL Server 2008, you have MUCH better control and can keep both regular OLTP and batch/reporting users happy.

There are a few terms we need to be familiar with before diving into Resource Governor. The following are brief descriptions of key terms. BOL has the gory details which I encourage you to check out.

  • Workload group: a logical way to classify different types of workloads you want to govern
  • Internal group: a “special” group created by and for SQL Server for system processes. This is not user configurable.
  • Default group: another special group created by SQL Server to be the catch-all bucket for any workload that doesn’t get allocated into a user defined group for whatever reason.
  • Classification function: a UDF that determines with group a submitted workload belongs to
  • Resource pools: a logical container for the amount of CPU and memory that members of the pool can consume
  • Internal pool: a “special” resource pool created by and for SQL Server to be consumed by system processes (in the internal group). This is not user configurable and will always have higher priority than all other pools.
  • Default pool: another special resource pool created by SQL Server to be the catch-all bucket for any group (stars with the default group) that doesn’t get allocated into other user defined pools. While it cannot be dropped, its configuration can be altered.

Getting started with Resource Governor.
Before we can start bossing workloads around with RG, there are a few things we need to consider.

Rules and environment prep:

  1. How many logical workgroups do you have in your database? E.g. Regular OLTP users, reports users, batch processors, administrators, security team, etc…. Can you standardize across your organization?
  2. What is the typical CPU and RAM utilization of your server? While limits can be adjusted easily, knowing the typical utilization will help avoid setting initial limits that are too low and causing performance problems.
  3. Is there a order of priority among your users?
  4. Is there an SLA (implicit or written) on query response time?

Once you figure out what you need from a user and administration perspective, you can then start implementing the controls.

Basic setup:

  1. Create workload groups in your environment that you want to have specific resource allocations for (others will fall into the default group)
  2. Create a classification function to place workloads in appropriate groups
  3. Register the classification function with Resource Governor
  4. Enable Resource Governor

The syntax itself is easy enough so I won’t elaborate here. The example below uses explicit transactions but it does not have to be that way. However, this is a recommended practice;  in case you make a mistake, it is easily undone (rolled back).

First, let’s create the buckets (known as groups) that we want to place our users into:

BEGIN TRAN
CREATE WORKLOAD GROUP ResGroupAdhoc
CREATE WORKLOAD GROUP ResGroupReport
CREATE WORKLOAD GROUP ResGroupAdmin
CREATE WORKLOAD GROUP ResGroupExecs

Next we need to create a classification function. This is the “filter” that decides which group each query submitted to SQL Server belongs to. Note that any workload that does not get classified by your function goes into the ‘DEFAULT’ group

CREATE FUNCTION ClassifierFn_Basic() RETURNS SYSNAME WITH SCHEMABINDING AS
 BEGIN
   DECLARE @ResGroup AS SYSNAME
   IF (SUSER_NAME() = ’sa’) OR (SUSER_NAME() = ‘REDMOND\v-jyong’)
      SET @ResGroup = ‘ResGroupAdmin’
   IF (SUSER_NAME() = ‘BillG’) OR (SUSER_NAME() = ‘CORP\DaBoss’)
      SET @ResGroup =  ‘ResGroupExecs’
   IF (APP_NAME() LIKE ‘%Management Studio%’) OR (APP_NAME() LIKE ‘%SQL Query Analyzer%’) OR
   (APP_NAME() LIKE ‘SQLCMD’)
      SET @ResGroup = ‘ResGroupAdhoc’
   IF (APP_NAME() LIKE ‘%REPORT SERVER%’) OR (APP_NAME() LIKE ‘%Microsoft Office%’)
      SET @ResGroup = ‘ResGroupReport’
   RETURN @ResGroup
END

COMMIT TRAN
GO

If you aren’t sure what App_Name to filter with, run Profiler for a while just to capture the ApplicationName column.

You can set classification conditions by a number of different connection properties including user name (both SQL Server and domain users), application name, hostname (source machine name), server role, etc… Check out BOL for details.

This function is just a sample of what you can do to place each user connection or application into specific groups. This is how you classify/categorize your users and it is with these groups that you will set resource utilization controls on later.

Next, we need to register the function with Resource Governor and enable RG.

ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION=dbo.ClassifierFn_Basic)
GO

Now that we have the function registered and RG enabled, we can set resource usage allocations for each group that we created.

First thing we want to do is to limit the amount of CPU resources ad hoc queries can consume. Let’s say we do not want ad hoc queries to consume more than 20% of CPU cycles  and no more than 15% of available memory if there are other workloads running. To do this, we need to create a resource pool, place the ad hoc group in that resource pool and set limits on that pool.

CREATE RESOURCE POOL ResPoolAdhoc WITH (MAX_CPU_PERCENT = 20,
     MAX_MEMORY_PERCENT = 15)

Next, we will assign the Adhoc group to the resource pool we just created so it will be governed by the policies we set for that pool. We then need to notify Resource Governor (kinda like what we already do when making a change with sp_configure).

ALTER WORKLOAD GROUP ResGroupAdhoc USING ResPoolAdhoc
GO
ALTER RESOURCE GOVERNOR RECONFIGURE
GO

Now all queries that are submitted via SQL Server Management Studio, Query Analyzer or SQLCMD (from a command prompt) will not be allowed to consume more than 20% CPU time or 15% of available memory if there are other workloads running. If the server isn’t very busy, RG is smart enough to let the queries use free cycles. For more details on how it works, check out BOL.

You can now repeat this process for other groups to set the max and/or min values for each group by creating and assigning resource pools for the groups. Try it out by submitting the long running query at the start of this article but use different UserIDs that belong to different resource groups. Run several instances of them and you’ll see the effect of the Resource Governor. It becomes very obvious when you run a lot of instances of that query.

Note that if you forget to assign a particular group to a resource pool, it will remain in the default pool. New applications or applications that are not defined in your function will fall in the default group which also falls into the default pool. (For example: another team may decide to use Toad as their query tool instead of Query Analyzer). That’s why it is always a good idea to set the default pool limits to values that you are comfortable with so that queries that are not caught by your classifier function will not hog server resources.

A few other things you should probably be aware of:

  • Dedicated Admin Connection runs in the “internal group”. That means you do not need to place it in a group to ensure it will always be able to run. SQL Server already takes care of that for you.
  • Don’t create vastly complicated classifier functions. All connections will be filtered through the function so if you have some convoluted code that performs 27000 checks, the connection may timeout before getting clasffied.
  • Multiple groups can belong to the same pool (e.g. default pool) but you can still impose some priority for allocation within that pool using the “IMPORTANCE” setting (LOW, MED, HIGH).
  • You can do all this in Management Studio, it doesn’t have to be via TSQL

Ok, so this is quite a bit more involved than the Query Governor Cost Limit in SQL Server 2005 and earlier but it also provides a lot more control and flexibility. This is one of those features that requires some work to take advantage of but it really isn’t much work plus the returns are huge. We covered the basics here which should enough to get you started. There are quite a few more things you can do with RG including setting up filters that just log events when queries run for a specific period (want an easy way to track offenders with long running queries?). BOL documents this pretty well and look for whitepapers (to come) on www.microsoft.com/sql/

So what should you do after upgrading to 2008 if you currently use Query Governor Cost Limit? I suggest creating the appropriate rules in Resource Governor (sorry, no such thing as an upgrade/migrate for this) starting with a close to 1-1 match for what you had before. Get used to its behavior then expand with more rules. This will work for probably >95% of the users of the Query Governor Cost Limit. There are some scenarios where you truly do not want a query to run if it is expected to last longer than x minutes. In which case, you can keep both; they are compatible.

Btw, if you’re wondering about how to monitor activity and how the allocations are actually being used, monitoring data is exposed via System Monitor (aka perfmon) and DMVs. BOL documents these well or you could just do what most DBAs do; fire it up and look around.

Beyond workload governance
Performance management and runaway queries aren’t the only things the Resource Governor is good for. With the ability to set resource utilization limits within a database, it becomes harder for malicious acts against your database such as a DDOS attack to cause widespread problems. It also provides greater flexibility and resource usage efficiency in database consolidation. There’s a good reason why this one feature is cited across multiple scenarios in the 2008 release. It really is super handy. Now as soon as we get network and disk IO added to the resources we can govern, DBAs will rule the world (more than before). :-)

joe yong

2 Responses to “It’s all about control (ok, governance)”

  1. Great article. I have a couple of observations and a question:
    1. The code would not work without some GO commands.
    2. What would happen if I connect using the ’sa’ account from SSMS? Or the man BillG connects to the server using SSMS? What would be returned by the classifier function?
    A. ResGroupAdmin and ResGroupExecs
    B. ResGroupAdhoc and ResGroupAdhoc
    (Books Online has a similar example :) )
    3. Euan wasn’t in the tools group back then?
    Regards,
    Cristian

    December 21st, 2007 | 10:06 am
  2. Depends on how you write the function. In this example, I was pretty lax about how the IF conditions are processed and how the loop exits. If you run as is, the last rule processed that captures a positive condition will determine which group your login will be placed.

    So if you use this example and login with SA from Management Studio, even though SA is initially assigned to ResGroupAdmin, the condition that traps “Management Studio” will also resolve to true so ResGroupAdhoc is the last value set. That’s what your user will fall into.

    If you tweak the function a little:

    CREATE FUNCTION [dbo].[ClassifierFn_BasicPart2]() RETURNS SYSNAME WITH SCHEMABINDING AS
    BEGIN
    DECLARE @ResGroup AS SYSNAME
    IF (SUSER_NAME() = ’sa’) OR (SUSER_NAME() = ‘REDMOND\v-jyong’)
    BEGIN
    SET @ResGroup = ‘ResGroupAdmin’
    RETURN @ResGroup
    END
    IF ……………. (rest of the function…..)

    The function will exit and return the group ResGroupAdmin. This requires a few extra keystrokes but is more efficient, especially if you have a lot of conditions/groups (generally not a good idea anyway but I’m sure some users will do it). Of course, the lazy way would be to arrange the conditions in the order of importance. I.E. the most important last.

    Either way, it does bring up an important point: users need to think hard about the priority of the conditions. Most of the time, the login or role will be the primary condition for determining which group to assign. However, there are exceptions.

    For example: User Alice leads a sales team so she has managerial responsibilities but she is also a sales person. That means she’ll be running regular sales applications to log orders, review customer data, etc… As a team lead, she also runs reports and performance analysis for her team so she can present to senior management. This is not an uncommon scenario and you can probably imagine by now the possible challenges this presents.

    I recommend starting with as few and and simple conditions you can. Logins/roles and application names are good places to start. Once you have it going, monitor to ensure users are getting the performance they should. Starting with too many rules/condition to cater for every imaginable situation might be a fun challenge but is likely to bring more pain than gain.

    You can easily figure out who’s where by using the RG DMVs. Also, existing DMVs have been modded to provide RG relevant info. For example,

    SELECT session_id, program_name, login_name, group_id
    FROM sys.dm_exec_sessions
    WHERE login_name=’sa’
    AND group_id 1
    AND group_id 2

    Will list any sessions with login SA but is not a internal process. 1 and 2 are internal and default groups respectively. I just picked those for convenience. There are better ways to filter out system processes but you get the picture.

    Yeah, Euan used to run the tools group. He’s with the VS team right now but we all know deep down, he’s still a SQL guy. Check out his blog http://blogs.msdn.com/euanga/

    January 10th, 2008 | 4:23 pm

Leave a Reply

You must be logged in to post a comment.