Please note that this article may be unfinished.
Over the years, HTML has gradually been pushed towards a place where machine-readable, meta-data vocabularies are given increased value and importance. This article takes a quick look at what vocabularies are available and which ones are worth using in a web application right now.
In the past, the limited vocabulary of HTML itself has meant that authors have pushed meta-data into HTML attributes meant for other purposes such as the CSS “class” attribute (ref. “Semantics in HTML5″ at A List Apart). But of course, there are no standards for this markup and so it is not universally recognisable. There are some other design patterns that have also been used in some limited degree to address these issues such as the abbr design pattern that uses the HTML abbr tag to add meta-date. Unfortunately, these have negative side effects such as making life difficult for visually impaired users so their use is tailing off.
In more recent times, “micro-format” markup has appeared to try and resolve these issues and with the advent of HTML5, we are just starting to see these formats become mainstream.
Prior to HTML 5, there were a very limited set of semantic tags in three categories, the main ones of which are:
- Structural HTML tags
div, span, h1-h9 (headings), ol & ul (lists), p (paragraphs)
Content HTML tags
abbr, address, code
Rhetorical HTML tags
HTML 5 semantic markup 🔗︎
HTML version 5 introduces a new set of markup designed specifically to give HTML documents greater semantic meaning.
The main new structural tags are:
In addition, there are new text-level semantic tags as well such as the
REL attribute based semantic markup 🔗︎
The REL attribute is valid on link tags and there are a number of standard names that add meaning to a link:
Other microformat semantic markup schemes 🔗︎
These are some of the more commonly used and useful micro-formats.
- RDFa (xHTML extension)
Note that hCalendar is not included because it uses the abbr design pattern which causes usability issues as noted above.
When should I use , , , and ? 🔗︎
One of the good aspects of the nascent HTML 5 standards is a clearer definition of when and why to use these deceptively simple markup tags. Historically and were used somewhat randomly to mark up emphasised text and only a few people used (emphasis).
HTML 5 says that:
- should be used to highlight text to draw attention without implying any change of “voice” or implying any increased importance (e.g. a product name or the lead-in to an article).
- should be used to imply a change of “voice” from the main text (e.g. a technical term or idiomatic phrase).
- should be used to imply a non-textual annotation (e.g. mis-spelled text).
- should now only be used to indicate increased importance compared to the surrounding text.
- should be used to indicate emphasised text.
has been deprecated in HTML 5 which is unfortunate as it would provide a useful semantic meaning in a number of contexts such as legal documentation.
Note: If you look at the HTML source for this document, you will find that it doesn’t actually follow these recommentations! That is because I used the visual editor in WordPress which uses TinyMCE which still insists on using the form that was correct prior to HTML 5 (using for italics and for bold text).