Automated Speed Monitoring with MSSQL

What? Why?


I'm in the web data extraction business. That means I'm often writing hundreds of thousands of rows to a table over the course of days. Well when you're dependent on a third party website for your data, a lot can go wrong over the course of days. So I started formulating my own table monitoring solutions.

At first I would just periodically do a SELECT * here and a SELECT count(*) there to get by. But when a weekend goes by and on Monday morning you're 10K rows short, you want to know if things just slowed down a little or if everything broke sometime Sunday night. My solution is to automate my little SELECTs to happen on a regular schedule and then store themselves in a log table. By comparing the latest row as I insert I can do an on the fly rate calculation as well. Sound useful? The details follow ...

How.


First I needed a log table. I made something like:

CREATE TABLE [dbo].[SpeedLog](
[Count] [int] NULL,
[Timestamp] [datetime] NULL,
[PerHour] [int] NULL,
[PerDay] [int] NULL
) ON [PRIMARY]

Next I made a view that produces a new row to insert into this table. I find it handy to have this logic isolated in its own view rather than having an enormous insert statement,

CREATE VIEW [dbo].[vw_Status]
as
SELECT
count(*) as [Count]
, CURRENT_TIMESTAMP as [Timestamp]
, (count(*)-MAX(lRow.UrlSnapShotsCount))/DATEDIFF(hour, MAX(lRow.[Timestamp]), CURRENT_TIMESTAMP) [PerHour]
, (count(*)-MAX(lRow.UrlSnapShotsCount))/DATEDIFF(hour, MAX(lRow.[Timestamp]), CURRENT_TIMESTAMP)*24 [PerDay]
FROM MyGrowingTable u
LEFT JOIN
(SELECT a.[Count], b.[Timestamp], a.PerHour, a.PerDay FROM SpeedLog a
INNER JOIN (SELECT MAX([Timestamp]) [Timestamp] FROM SpeedLog) b ON a.[Timestamp] = b.[Timestamp]) lRow
ON 1=1

That is a bit much to throw at anyone all at once. To abstract it a little in pseudo code I'm saying:

SELECT CurrentCount, (y2-y1)/(x2-x1) as PerHour, PerHour*24 as PerDay
FROM MyGrowingTable
LEFT JOIN (the most recently inserted row of my log table) lRow
ON 1=1

Since my lRow sub query only returns one row ever I can get away with a 1=1 ON clause and still have one row as a result. It is important to do a LEFT join to lRow so that a first row with no rate info can be inserted.

Now I just need to insert on some regular schedule. I do this via a SQL Server Agent Job. If you've never used one of those before don't worry. It is a point and click affair within Management Studio. You just have to make sure you have SQL Server Agent running as its own service. Then you can add a new job with one task within it that runs:

INSERT INTO SpeedLog
SELECT * FROM vw_Status

Since the logic is in the view, you can tweak things without having to dive into the Agent Job every time.

Where next?

count(*) is a pretty simple measure of a table. For your own purposes maybe you want to track the average number of rows where some column is null. That would be no problem. You could track as many metrics as you wanted by just adding more and more columns to the log table and your view. More complex rate analysis could be done on the entire log table as well.

As with every solution I create. It seems like a problem other people must have run into and solved already. I think I'm just bad at finding solutions. If you have something else to tackle this problem please share!

0 comments: (+add yours?)

Post a Comment