|
One
of the criteria used by Google when displaying
search results is the Page Rank (PR). The higher
the PR, the higher a page will be shown in search
results. In this article we explain how Google
calculates he PR and how you can optimize your
website for achieving a high PR. Google also uses
factors like relevance for the search - this is
not discussed in this article.
Google
assigns a PR of between 0 and 10 to all pages
which it indexes. It is possible to install a
Google toolbar which shows the PR when you visit
a page. The toolbar can be downloaded from http://toolbar.google.com/.
In
reality, the PR is a value between 0 and 'a very
large number'. A logarithmic scale is most likely
used to translate the value into the PR we know.
A logarithmic scale works so values may be translated
as follows:
Values
0.1 to 1 results in PR 0
Values
1 to 10 results in PR 1
Values
10 to 100 results in PR 2
Values
100 to 1000 results in PR 3 etc
What
we'll look at in this article is how the value
is determined. The actually scale used is not
important to understand the concepts and it is
not known to the public. It is likely that the
scale is changed regularly.
Before
looking at the actual formula - which may seem
a bit scary - we'll discuss the basic idea by
which the founders of Google wished to set importance
- expressed in PR - to webpages.
The
basic idea is that a link from a page A to a page
B indicates that page A casts a 'vote' for page
B. The link from page A to B indicates that page
B has something important on it which means that
page A endorses page B. The basic idea of Google
PR is that the more votes a page have, the higher
the PR it should get. There are however different
weights assigned to the votes cast. A vote from
a page with a low PR is less important than a
vote from a page with a high PR and therefore
has less weight. If a page A casts many votes
to many pages by having links to a number of pages,
then the vote of page A is divided into small
parts and each part is then assigned as a vote
to each of the pages which page A links to. As
you may have deduced from this explanation, the
PR then expresses - to some extent - the likelihood
that someone surfing randomly on the web will
end up on the given page. A probability is therefore
expressed in the PR.
Before
looking at how this works out, we'll look at the
exact formula used. You need not understand the
formula as long as you understand the basic idea,
but we will add it here for completeness. We'll
look at what the basic idea and what the formula
means in terms of getting a high PR a bit further
down.
The
basic formula is the following, where A is the
page we wish to find the PR for and pages T1 to
Tn are the pages linking to page A:
PR(A) = (1-d) + d (PR(T1)/C(T1) + PR(T2)/C(T2)
+ ... + PR(Tn)/C(Tn))
Where:
- PR(A) means the
PR of page A. Subsequently, PR(T1) means the
PR of page T1
- C(T1) is the number
(count) of links going out of page T1.
- PR(T1)/C(T1) - it
follows from 1) and 2) that this is the page
rank of the page T1 divided by the number of
outgoing links from page T1. In other words,
PR(T1)/C(T1) expresses the part of the vote
of page T1 that is awarded to page A.
- d is a damping factor
which is probably set to around 0.80 to 0.85.
We will not look into details as to why this
factor is needed, but just state here that it
has to do with probability distribution
|