Summary: Search engines perform a number of functions to enable them to build up an accurate and relevant index of web pages which can be served to users looking for information.
So you know that your computer that you’re using right now is connected to the internet. Maybe it’s part of a network, like you’re one of a bunch of computers at work or university. The internet is a bunch of computer networks all joined together – some are private networks, some public, some business, some government, and so on. A load of those computers have resources and information, like websites and pages that are meant for other people to look at. But how do you find them? It’s a bit like a billion people having a phone but nobody having a phone directory!
But wait! Aha! That’s where search engines come in. Search engines crawl the web looking for those pages, and creating an index so that people who want to find pages on a particular subject can do. How does that work then?
The search engines have automated programs called search engine ‘bots’ or ‘spiders’. These use the structure of the web (everything linked up to everything else) to crawl through all the pages and documents that computer owners have made public. There is something like 20 billion pages, and of these, search engines have crawled through about 8-10 billion to date.
Once crawled, the contents of a particular page can be ‘indexed’. This just means the contents are stored in a big database made of loads of documents. Search engines are incredibly impressive because when a user searches for a particular term, the contents of billions of documents that have been indexed are compared, and relevant documents can be retrieved at lightening speed – less than a second.
When a request is received by the search engines – and there are hundreds of millions of requests made every single day – the search engine pulls from its index all documents that match that query. Matches, as you might expect, are pages where the phrase the user has specified seems relevant to that page. You might think that this just means the phrase appears on the page – but in reality search engines look at hundreds of factors to work out which pages are the best to serve up to the user. They have to, because with billions of pages, as you can imagine, it’s very hard to figure which have the highest quality – just counting the number of times a word appears on a page isn’t enough (if it was, you could easily fool the search engine bots by just repeating your target word or phrase, over and over again).
That brings us on to a huge function of search engines – ranking pages. Search engines use mathematical algorithms to decide which pages are the best. Unfortunately for us, these aren’t fully explained publically. If they were, people would just figure out ways to cheat and get to the top, without producing really useful content. SEO experts think that the top search engines (like Google) look at anything up to 200 on page and off page factors when figuring out whether a page is worth serving up to one of its users or not. ‘Off page factors?’ I hear you cry – what? These are things that are going to affect your search engine ranking but which you don’t have direct control over. They include, for example:
- The age of your domain (how long have you owned it for?)
- DMOZ listing – a well respected directory which is human edited so considered highly authoritative.
- Yahoo listing – another human reviewed directory considered authoritative.
- Cache age – Google likes regularly updated sites and puts emphasis on when it last crawled yours.
- Page rank – a subject in itself but Google’s way of measuring how important your site is.
- Back links – the more you have from high quality authoritative sites, the better.
These off page factors are not under your direct control but you can influence some of them. For example, if you produce frequent good quality content, you’re more likely to get a Dmoz / Yahoo listing and you’re more likely to obtain back links, leading to a higher page rank and likely a faster crawl rate from Google.
Enjoyed this article?
Subscribe to our RSS feed, follow us on Twitter or just simply recommend it.

Further Discussion
Leave a Response
Make sure you enter the * required information where indicated. Responses are moderated so please no link dropping, no keywords or domains as names; do not spam, and do not advertise!