Most search engines were created in the days when the web was in its infancy, when pages were little more than plain text and hyperlinks.
Search Engine developers made spiders that read through html in order to determine how relevant a page was to any particular search term. In 1994 this was somewhat easier than it is today. A typical html page then would have appeared something like this:
<title>New Car Net</title>
<h1>Welcome to New Car Net, the home of new cars</h1>
<h2>Read about the Ford Granada</h2>
<p>The new ford granada was launched yesterday to a …</p>
<a href=”ford_granada.html”>read more</a>
If a search engine were to interpret this page, it would be fairly straightforward to determine that the page is mainly concerned with new cars, and that the page it links off to is more likely than not to be a page about the new ford granada.
In 1994, since most pages did not run off external data (they were static pages), web developers had to name pages in a memorable way. Also, web pages were coded so that the information was hierarchical. The most important information was in <h1> tags, the next most important was within <h2> tags and so on, up to <h6>.
The first search engines used a set of rules and a method of weighting the information (an algorithm) particular to each engine that were based on this very simple html, allowing users to search the web for information throughout the world.
It was this method of searching that made the web a much more valuable property than it otherwise had been.
Although the building blocks of the web have changed dramatically since then (graphics, video etc), most search engines haven’t been updated significantly.
In the mid 1990s onwards, there was a slew of websites that were built in a more graphical format and who threw away the rulebook of how a website should look and behave. Header tags were disregarded, everything was built with tables and divs and search engines began to find it harder to work out what was relevant. The solution they came up with? “Sod it”. They decided it was just about impossible to work out the most important parts on the page without header tags.
The building blocks of the web are changing in the most significant way for almost 20 years. Rather than being backwards looking in how they create the underbelly of a website, developers are now going to need to be much more forward facing, as search engines are keeping an eye on how the semantic web is progressing and as soon as it reaches a tipping point, the algorithms will all change and the Google dance will become less of a waltz and more of a Merengue.
In the coming weeks I will set aside some time to write a full post on this topic.