1 August 2006
Reporting Services + indexed views = good
Posted by Barnabas under: Programming .
SQL Server Reporting Services is a very convenient way to create and share reports on the company intranet, but what good is a nice web GUI (even one that doesn’t work so well with Firefox, my development browser of choice) if the data that drives it is so slow it’s unusable? This is the problem that I recently faced while reporting on massive amounts of data generated by Advertizer. It’s the classic problem: everyone wants to see trends and totals of huge amounts of live data in real time. What good is old data? What good is a report that takes five minutes to run? For a while, the standard reporting procedure was to send me an e-mail and wait until I could whip up some SQL and fire off a ten-minute query in query analyzer SQL Server Management Studio. I had gone down the OLAP trail once before, and I learned enough to know that it wasn’t going to be a quick fix.
Then I learned about SQL Server 2005 indexed views as a performance solution. Indexed views are magical in the true sense of technology that defies comprehension and inspires awe. Reports that used to take five minutes to run are nearly instantaneous, so now reporting services are a possibility again. In short, indexed views are the technological means to both have and eat your business cake, which is now frosted with easy-to-use browser-driven reporting. The non-IT staff is overjoyed with their new visibility into vital and addicting reports. “Quick, how much money did we make in the last 45 seconds?” Peals of maniacal laughter echo off the office walls. They are drunk with data.
I don’t enjoy reading documentation (lack of plot and character development), but here were some tips that would have saved me the limited reading I did:
- Create the view with the SCHEMABINDING option.
- Define the view to pre-aggregate the data (this is where the performance enhancement comes from). For example, I have a view that splits up the year, month, day, and hour of a click, groups by client, and also returns a total. I can also reuse this view to get totals by hour, day, month, year, or all time.
- Don’t use COUNT, use COUNT_BIG.
- Don’t use AVG; use SUM and COUNT_BIG. Beware of integer overflows in your final query; SUM always wants to make an INT. Why isn’t there a SUM_BIG? You can get around this by using CAST or CONVERT in your final query.
- Make a clustered unique index on the view. This forces the database engine to persist the values to disk rather than calculating them on the fly, which is what leads to five-minute queries. I am surprised to learn that you don’t need to put all the values of the table in the index, just the ones necessary to make a clustered unique index.
- When querying the view, use the NOEXPAND hint.
Perhaps I’ll update this post later with a sample view definition and query, but I’m off to 24 Hour Fitness to hit the treadmill. I better hurry before it closes.
All right, here’s the sample. Samples always work better for me than bullet points. First, I have a table called WebBrowseLog, into which I load web browsing logs, hence the name. For the purposes of this example, I want to make an indexed view that lets me aggregate number of hits per hour by site. I have three fields in the table which I care about; DateCreated: a datetime field that is the date the page was viewed, SiteID: a varchar field that identifies the website, and TotalTime: an int field that contains the number of milliseconds it took to render the page. Here’s the view definition:
CREATE VIEW [dbo].[BrowseLog_BySite] WITH SCHEMABINDING AS SELECT DATEPART(year, DateCreated) AS BrowseDate_Year , DATEPART(month, DateCreated) AS BrowseDate_Month , DATEPART(day, DateCreated) AS BrowseDate_Day , DATEPART(hour, DateCreated) AS BrowseDate_Hour , SiteID , COUNT_BIG(*) AS BrowseRequests , SUM(ISNULL(TotalTime, 0)) AS SUM_TotalTime FROM dbo.WebBrowseLog GROUP BY DATEPART(year, DateCreated) , DATEPART(month, DateCreated) , DATEPART(day, DateCreated) , DATEPART(hour, DateCreated) , SiteID
After the view is created, then we need to create the unique clustered index on it (notice that we don’t need the last two fields in the index):
CREATE UNIQUE CLUSTERED INDEX [IX_BrowseLog_BySite] ON [dbo].[BrowseLog_BySite] ( [BrowseDate_Year] ASC , [BrowseDate_Month] ASC , [BrowseDate_Day] ASC , [BrowseDate_Hour] ASC , [SiteID] ASC )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
Finally, here’s a sample query to get all data for the month of July, 2006:
SELECT SiteID , SUM(BrowseRequests) AS BrowseRequests , SUM(CAST(SUM_TotalTime AS float)) / SUM(BrowseRequests) AS AverageTime FROM BrowseLog_BySite (NOEXPAND) WHERE BrowseDate_Year = 2006 AND BrowseDate_Month = 6 GROUP BY SiteID