QA/ValidatorStatistics

From W3C Wiki
< QA

Validator Statistics

Often it might be interesting to evaluate the quality of your Web site among certain criterias. One of the easiest criteria is the validity of your Web page which by itself can be subdivided in a few categories.

  • NO DOCTYPE: The Web page has no doctype. So it's immediately not valid.
  • DOCTYPE: The Web page has a doctype. You can add new statistics calculation.
    • Is it valid?
    • Which doctype is used?
    • Which combination doctype/mime-type is used?
    • Which encoding has been used for the file?

Validity of a Web site

The validity of a Web site is not the validity of the Home Page of this Web site. Though it is often a first, cheap and fast test to see if the Web site maintainer has considered the validity of the Web site or that it may have difficulty to maintain it with the available tools.

Different kind of statistics can be done.

  • (Invalid Page Number) / (Total Page Number) = File validity
  • (Invalid Traffic) / (Total Traffic) = Traffic validity. See LogValidator
  • Validity of Web pages compared to time duration = Time validity. If a page has a sequency of non-validity because of a small glitch in the tools or human updates. (Errare Humanum Est)
  • Validity of most popular paths through Web site.
  • Validity of most popular pages.
  • Validity of random pages - "I'm feeling unlucky" ;-)

Tools

We need to define the types of approaches for reporting on HTML compliance:

  • Self-contained HTML validation tools (desk-top, Web-based, ...)
  • Bookmarklets
  • URI interfaces (as implemented on W3C Web site and documented at A URI Interface To Web Testing Tools)
  • Authoring tools with HTML repair capabilities
  • CMS tools, Apache modules, ...?
  • Enhancements to Web analysis tools (e.g. Web statistics packages)

LogValidator is a tool to help you to evaluate the quality of your Web site. It's a modular tool that you can extend by Perl plug-ins to test any kind of things with regards to your needs.

There could be a need of a tool to try to poll the Web and create general statistics.

It would be useful to define the limitations of such validation tools (for example the difficulties when Web sites use some form of negotiation (based on user agent, refer fields, time of day, etc.), dynamic content (including Wikis!), ...

It would be useful to produce a functional specification for a HTML auditing tool (some work in this area was published in a paper on Automated Benchmarking Of Local Government Web Sites).

As well as open source tools likely to be of interest to / developed by users of this Web site it would be useful to engage the commercial sector in this area (e.g. Web analysis packages doing this). If this does happen it is important for the standards touched on about to be defined so that findings are reproducable across different applications.

Interchange Format For Auditing Results

It would be useful to have an interchange format for the results of HTML validation surveys, possibly along the lines of EARL. This could include details of survey tools, approaches (viewing and processing source, using robots, user-agent strings used by tool, other environment settings, ...

User Education

There is a need to address the education of HTML authors, software developers, etc. This will include raising awareness of the need for validation, pros and cons of different models for creating content (HTML authoring tools, text editors, CMSs, Wikis, etc), tools for auditing compoliance and strategies for addressing problems (e.g. along the lines of the LogValidator).

References