Simply checking my web

I’ve been doing some web development lately, and the old question of testing has of course come up (and some might say, of course I had to roll my own solution).

I wanted a simple way to sanity check a site, to ensure that my article changes didn’t suddenly break comments on images (lagacy apps are strange beasts). So I ended up writing SWEC, the simple web error checker. It’s a basic app that goes through all links in a site (or “webapp”) as long as those are present in the HTML (ie. it doesn’t run any JS, so its use in JS/AJAX/AJAJ-heavy webapps can be somewhat limited). It parses all pages it downloads, looking for known errors and then reports those. For instance, if you run it on a site based on Catalyst (perl) and catalyst crashes with its standard backtrace, SWEC will return which page it happened on, which page referenced it and a quick line about what happened. Ie. if it’s an exception it’ll say “Exception in Catalyst controller”.

It uses a very simple file format for writing tests (which is well documented in SWEC’s manpage). It has several different types of tests, but the most common one looks something like this: [SWEC_CATALYST_CONTROLLER_EXCEPTION] type = regexs check = Caught exception in.Controller.Request.Catalyst error = Exception in Catalyst controller sortindex = 11 What’s between the brackets [ ] is the name of the test. All tests that are shiped with SWEC are prefixed with SWEC_. The type defines which “type” of test it is. This one is “regexs” which is a ‘smart’ regex, a standard perl regex that swec modifies during runtime to easier match HTML. The check is in this case a normal perl regex that is applied to the entire html document. As the type is regexs, swec will modify the regex to this during runtime: `Caught(\s+| |<[^>]+>)+exception(\s+| |<[^>]+>)+in.Request.*Catalyst

The error is the string that will be returned, and the sortindex is used for prioritizing tests, the lower the better (bundled tests will always be positive, so one only needs to give tests a negative index to ensure they will be run before bundled ones).

By default the bundled tests (default.sdf) and the user-specific rc file ~/.swecrc will be loaded. The user-specific one can disable bundled ones easily, and you can disable them on the command line on an individual basis.

SWEC supports sessions, where SWEC remembers previously checked URLs, and previous errors and can then either check pages that used to have errors before the others, or only report ‘new’ errors that did not exist before. This will also remember all settings that you set so you don’t have to type it every time (although it’ll allow you to do that as well). It has cookie support so it will run just fine as a logged-in user, though you probably don’t want to run it on a live database, but rather a test one, as it’ll click on any link it sees (with a few exceptions, it tries to avoid ‘logout’ and ‘delete’ links, additions to the exceptions list is welcome).

It’s GPLv3, so feel free to hack your own things into it. I’ll accept patches for the app itself, as well as new tests to be bundled. As long as they are either specific to a language, web server or framework, I’ll happily add more bundled checks (or fixes to existing ones), however I will try to avoid app-specific checks as that might just get a bit too much.

Happy hacking