If you've ever tried to teach someone HTML, you know how hard it is to get the syntax right. It's a perfect storm of awfulness.
Newbies have to learn all of the syntax, in addition to the names of HTML elements. They don't have the pattern matching skills (yet) to notice when their XML is not right, or the domain knowledge to know it's spelled "href" and not "herf".
The browser doesn't provide feedback when you make mistakes - it will render your mistakes in unexpected and creative ways. Miss a closing tag and watch your whole page suddenly acquire italics, or get pasted inside a textarea. Miss a quotation mark and half the content disappears. Add in layouts with CSS and the problem doubles in complexity.
Problems tend to compound. If you make a mistake in one place and don't fix it immediately, you can't determine whether future additions are correct.
This leads to a pretty miserable experience getting started - people should be focused on learning how to make an amazingly cool thing in their browser, but instead they get frustrated trying to figure out why the page doesn't look right.
Let's Make Things A Little Less Awful
What can we do to help? The existing tools to help people catch HTML mistakes aren't great. Syntax highlighting helps a little, but sometimes the errors look as pretty as the actual text. XML validators are okay, but tools like HTML Validator spew out red herrings as often as they do real answers. Plus, you have to do work - open the link, copy your HTML in, read the output - to use it.
We can do better. Most of the failures of the current tools are due to the complexity of HTML - which, if you are using all of the features, is Turing complete. But new users are rarely exercising the full complexity of HTML5 - they are trying to learn the principles. Furthermore the mistakes they are making follow a Pareto distribution - a few problems cause the majority of the mistakes.
Catching Mistakes Right Away
To help with these problems I've written an validator which checks for the most common error types, and displays feedback to the user immediately when they refresh the page - so they can instantly find and correct mistakes. It works in the browser, on the page you're working with, so you don't have to do any extra work to validate your file.
Best of all, you can drop it into your HTML file in one line:
</p> <script type="text/javascript" src="https://raw.github.com/kevinburke/tecate/master/tecate.js"></script> <p>
Then if there's a problem with your HTML, you'll start getting nice error messages, like this:
Read more about it here, and use it in your next tutorial. I hope you like it, and I hope it helps you with debugging HTML!
It's not perfect - there are a lot of improvements to be made, both in the errors we can catch and on removing false positives. But I hope it's a start.
PS: Because the browser will edit the DOM tree to wipe the mistakes users make, I have to use raw regular expressions to check for errors. I have a feeling I will come to regret this. After all, when parsing HTML with regex, it's clear that the <center> cannot hold. I am accepting this tool will give wrong answers on some HTML documents; I am hoping that the scope of documents turned out by beginning HTML users is simple enough that the center can hold.
Liked what you read? I am available for hire.
Thank you for this awesome validator! Q: what is the difference of this validator to a w3c html validator
The HTML validator cares too much about adhering to the HTML spec – it reports *too* many errors, for things like an X-UA-Compatible header for example, which is not fully compliant with the HTML5 spec. This makes it hard for newbs to pick out the actual errors that are causing their doc to show up in all italics, for example.
Great idea!Im going to recommend this my students
Kevin, this is awesome! Thanks for sharing this, I’m passing it along to my coworkers for sure.
I agree. Having recently been teaching my girlfriend HTML I was surprised by how much minutia there was in it. After a while you don’t notice the little things.
Also, I had no idea HTML is now Turing complete. :-)
Well, it looks like there is still some work to do, but I applaud the project. Attribute class=”red square” causes an error, for example. You picked a tough problem!
Yep – Going to fix it tonight: https://github.com/kevinburke/tecate/issues/9
Amazing idea. I’m going to start using this in some of my projects. Thanks for making it!
Could you get your program to find the error here?
Only one of these works:
alert(“XML file loaded!”);
alert(“XML file loaded!”);
The one that fails was copied from a pdf book.
Ouch! Now neither works. It’s all to do with the quote marks.
Fixed now!
Hey, this is cool.. thanks!
why not using schema validation,
thats why we use xml no?
Hi,
In my experience HTML schema validators tend to throw up warnings for a lot of things that aren’t really problems. For example, html validators warn about the Google+ meta tag, about an invalid setting for X-UA-Compatible, also that your images don’t have alt=”” declarations, and none of these are really problems compared to missing a closing div tag.