|
How
to find out who is visiting your web site 
How successful is the average Web site? There's no way to know,
for sure but, Web site statistics, can, at least, give you some
basic tools of measurement. Web measuring devices are designed
to endow the increasing numbers of Web site creators with some
basic tools for honing their messages. In effect, these tools
can help you determine the who, what, when, where (and implied
why) of your visitors.
Who is hitting on your site? What are they viewing? When are they
visiting? And from where are they coming? And what's most important
about many of these tools is that these four questions can be
answered quietly, behind the scenes, without any of your visitors
even guessing that they're providing you with increasingly valuable
information.
What Information Is Available?
Before you can decide what type of analysis you want to do, you
need to know what information is available. Unfortunately, there's
not much tracking data you can collect, and what you can
get is unreliable. But don't despair - you can still gain useful
knowledge from what does exist.
Your Web servers can record information about every request they
get. The information available to you for each request includes:
-
Date and time of the hit
-
Name of the host
-
Request
-
Visitor's login name (if the user is authenticated)
-
Web server's response code
-
Referrer
-
Visitor's user agent
-
Visitor's IP address
-
Visitor's host (if the visitor's IP address
can be translated)
-
Bytes transferred
-
Path of the file served
-
Cookies sent by the visitor Cookies sent
by the Web server
The information you have available is inaccurate
but, not completely, unreliable. Although this data is inexact,
you can still use it to gain a better understanding of how people
use your site. To start things off, let's talk about hits and
pageviews. (A hit is any request for a file your server receives.
That includes images, sound files and anything else that may appear
on a page. A pageview is a little more accurate because it counts
a page as a whole - not all its parts.)
As you probably already know, it's quite easy to find out how
many hits you're getting with a simple hit counter but, for more
precise analysis, you're going to have to store the information
about the hits you get. An easy way to do this is simply to save
the information in your Web server log files and, periodically,
load database tables with that data or to write the information
directly to database tables.
If you load your data directly into a database, you will either
need a Web server with the capability already implemented (such
as Microsoft's IIS) or you will need the source code for the server.
Another option is to use a third-party API.
Once you do that, you can gather information about how many failed
hits you're getting - just count the number of hits with a status
code in the 400s. And if you're curious, you can drill down farther
by grouping by each status code separately.
Pageviews
On the whole, though, counting hits isn't as informative as counting
pageviews. And the results aren't comparable to those of other
sites (see the Internet Advertising Bureau's industry-standard
metrics).
To count pageviews, you need to devise some method of differentiating
hits that are pageviews from those that are not. Here are some
of the factors to take into account:
-
Name of the file served
-
Type of the file served (HTML, GIF, WAV,
and so on)
-
Web server's response code
-
Visitor's host
Once you've determined which hits are pageviews
and which are not, you can count the number of pageviews your
site gets. But, you'll probably want to drill down in your data,
eventually, to determine how many pageviews each of your pages
gets individually. Furthermore, if you split your site into channels
or sections - you may want to determine how many pageviews each
area gets. This is where standards for site design can help.
If this standard is in place at all levels of your site, you can
summarize and drill down through your pageviews, at will. Of course,
there are some problems with this method. You may want to count
a pageview in one section, part of the time and in another section,
at other times.
More About Pageviews
Once you're comfortable with some programs designed to retrieve
the types of information above, you should be able to use your
knowledge to code programs to give you the following:
-
Pageviews by time bucket You can look at
how pageviews change every five minutes for a day. This will
tell you when people are accessing your site. If you also
split group pageviews by your visitors' root domains, you
can determine whether people visit your site before work hours,
during work or after work.
-
Pageviews by logged-in visitors vs. pageviews
by visitors who haven't logged in What percentage of your
pageviews come from logged-in visitors? This information can
help you determine whether allowing people to log in is worthwhile.
You can also get some indication of how your site might perform
if you required visitors to log in.
-
Pageviews by referrer When your visitors
come to one of your pages via a link or banner, where do they
come from? This information can help you determine your visitors'
interests (you'll know what other sites they visit). And if
you advertise, this information can help you decide where
to put your advertising dollars. It can also help you decide,
more intelligently, which sites you want to partner with -
if you're considering such an endeavor.
-
Pageviews by visitor hardware platform,
operating system, browser, and/or browser version What percentage
of your pageviews comes from visitors using Macs? Using PCs?
From visitors using Netscape? Internet Explorer? It will take
a bit of work to cull this information out of the user agent
string but, it can be done. Since browsers are continually
being created and updated and, therefore, the number of possible
values in the user agent string continues to grow larger,
you'll have to keep up-to-date on whatever method you use
to parse this information.
-
Pageviews by visitors' host How many of
your pageviews come from visitors using AOL? Earthlink?
Note that you may want to mix and match these
various dimensions. For example, how do your referrals change
over time? Does the relative percentage of Netscape users vs.
Internet Explorer users change over the course of the day? Does
one area of your site seem to interest Unix users more than other
areas?
How To Count Unique Visitors
Now let's talk about visitor information. Look at the bulleted
paragraphs above and replace the word "pageviews" with the word
"visitors." Unfortunately, counting visitors is more difficult
than counting pageviews. There is absolutely no way to count visitors
reliably.
Basically, there are three types of information you can utilize
to track visitors: their IP addresses, their member names (if
your site uses membership), and their cookies.
The most readily available piece of information is the visitor's
IP address. To count visitors, you simply count the number of
unique IP addresses in your logs. Easiest isn't always best. This
method is the most inaccurate one available to you. Most people
connecting to the Net get a different IP address every time they
connect.
That's because ISPs and organizations, like AOL, assign addresses,
dynamically, in order to use the limited block of IP addresses
given to them more efficiently. When an AOL customer connects,
AOL assigns them an IP address. And when they disconnect, AOL
makes that IP address available to another customer. This method
becomes increasingly inaccurate if you're examining data over
longer time periods.
If you allow people to log in to your site through membership,
you have another piece of information available to you. If you
require people to log in, visitor tracking becomes much easier.
And if you require people to enter their passwords each time they
log in, you're in tracking heaven. As we all know, though, there's
a downside to making people log in - namely that a lot of people
don't like the process and won't come to your site if you require
it.
If you do force people to log in, however, you can count the number
of unique member names and easily determine how many people visit
your site. If you don't force people to log in, but do give them
the option to do so, you can count the number of unique member
names; then, for those hits without member names attached, you
can count the number of unique IP addresses instead.
Lastly, you can add cookies to your arsenal. Define a cookie that
will have a unique value for every visitor. Let's call it a machine
ID. If a person visits you without providing you with a machine
ID (either because she hasn't visited your site before or because
she's set her browser not to accept cookies), calculate a new
value and send a cookie along with the page she requested.
So now you can count the number of unique machine IDs in your
log. But there are still a couple of issues that we need to discuss.
-
Many people turn off their cookies, so
you can't rely on cookies, alone, to count your visitors.
-
The cookie specification allows browsers
to delete old cookies. Even if this option wasn't specified,
a user's hard disk can always fill up. Either way, the cookies
you send to a visitor may be removed, at some point. So, it's
possible that a person who visits your site at 8 a.m, will
no longer have your cookie when they return at 9 a.m.
-
When your Web server sends a cookie to
a visitor, it's stored on the visitor's machine. If a person
visits your site from home in the morning using her desktop
machine and visits again from work using another PC, you'll
log two different cookies: it's tied to the machine, not the
visitor.
-
Multiple people may use the same machine,
in which case, you'll see only one cookie for all of them.
-
Various proxy servers may handle cookies,
differently. It's possible that a given proxy server won't
deliver cookies to the user's machine. Or it might not deliver
the correct cookie to the user's machine (it might even deliver
some other cookie from its cache). Or it might not send the
user's cookie back to your Web server. Unfortunately, proxy
servers are still young. There is no formal and complete standard
for how they're supposed to work and there's no certification
service to ensure that they'll do what they're supposed to
do.
The last issue is, do you want to track the
information you have over multiple days? Or, is one day's worth
enough? If one day's data will suffice, you can get away with
simple programs that process your log files. If you prefer to
process multiple days' information, however, you'll want to store
it all in a database.
|