As you can see I had this idea of hosting my own blog. In case you wondered your here reading it. Now I thought that would be a simple thing to do just put a site onto the internet and eventually the magic that is the Google-bot or the Bing-bot (do we call it that?) would one day swoop down and make me part of the internet (I firmly believe, albeit slightly misguided that if your on a public site and not in the index then your not actually on the internet). Now I suppose before I go into my failings as a web developer I feel that I need to justify myself.
How did I get here?
With my host it was fairly easy to get my blog site up and running, it’s a classis ASP multi-tenant provider with a shared server offering, with all the benefits (mostly cost) and shortcomings (mostly performance). It has an automatic deployment of WordPress of a certain patch level, 2.7.1 and then it’s simplicity to use the internal upgrade feature of WP to make sure the software is up to date. The impressive nature of WordPress for deployment and integration of new themes and components should be a model for all sorts of web based applications, anyway I digress. Once up and running, I thought that was it, all I needed to get done was to add some useful posts and my site would become one with the collective and all I would do would be to use Google Analytics to work out how many people were coming to the site and where.
Now this is where it became interesting. Google Analytics is an excellent tool for finding out whose accessing your site and from where. It even shows a map of the locations around the world or an individual country of where the people are coming from. The question is unless you have a lot of friends and colleagues who might want to read (and be interested in what your writing) then you need to get it out to a wider audience. Google has another set of tools to do this, it is the method of expediting the crawling of your site by Google, sort of telling the search engine that your ready for your close-up.
Where’s your Sitemap?
Now before you think that registering your site with Google, or Bing for that matter as they have a basic set of equivalent tools, will open the floodgates of people to be exposed to your pearls of wisdom you should think again. Getting into the Google Index isn’t that hard, getting listed high up in the list for a particular search, or even on the magical first page, needs a lot of people to link to your site and add trackbacks and comments to actually show Google that your site has value to other people over just being a repository of drivel, I leave you to decide what this is. It is at this point the science of getting into search turns into the art of Search Engine Optimisation or SEO. Wikipedia defines SEO as the following:
SEO is the process of getting your site improved within the search indexed so that for the right keywords that your site ranks in the top two pages, at least. It relies on a number of factors but starts basically with two files, sitemap.xml and robots.txt. The first file tells the search engine which pages to index specifically and the second file tells search engines what not to crawl through. The difficulties is that as time moves on there are no hard and fast rules which determine what a particular bot finds important. As ‘blackhat’ techniques have been used to play the system to improve page rankings, so have the algorithms used changed to sniff out people not playing fair. This means that whilst the sitemap is important for notifying people about what to index, the robots file is equally important about telling them what not to index so as to possibly be blacklisted by the crawler for trying to play the system.
Thus you have a never-ending merry-go-round of SEO optimisations becoming redundant and new techniques being developed to try and keep people sites near the top. This doesn’t even take into consideration your actual location and how the search engine knows this and will send you to your ‘local sites’ based upon where your coming from. This sort of geo-targeting isn’t new but is increasingly being used to target people in all sort of sites, from search to twitter (see Trendsmap for an excellent example).
Now in most of my web development life SEO isn’t a term that needs to rank highly in a solution to maintain pipes or manage gazetteer data, but as more and more sites are exposed to the web and more companies see value in exposing their data to be used by everyone, then the nature of such mechanism as SEO and GIS will often need to be used together. This is especially important in the sharing of spatial metadata in a form that can be indexed easily by search engines and therefore more widely disseminated.
SEO and ArcGIS Server
Now I was wondering how we can both use SEO for promoting and sharing information from an ArcGIS server implementation and also how this might be used to protect services from being indexed when you don’t want them too. The REST API has been around for a while now you can see how Google indexes ArcGIS sites ‘on the web’ by doing a search on “ArcGIS/rest/services”. You could block this from an index by using a standard pattern within your robots exclusion file such as:
This might be supplemented by more complicated patterns that use wildcards although it should be understood that the mileage of this might vary according to the bot doing the crawl as it deviates from the standard.
Of course it is important to understand, only ‘good’ crawlers obey the robots.txt file, ‘bad’ robots will crawl anything if you put data onto the internet, you have to assume it’s going to be used. It’s therefore important that if applications and data need to be secure from unauthorized usage that you use the appropriate security measure for your application, more details about this can be obtained from the ESRI documentation here [Working with secure ArcGIS services].
Services are always only one part of any application, it’s also important to make your user interface as SEO friendly as possible with a mapping interface. This post (from SEOmoz.org) gives a good overview about how you can provide spatial information that can be reported in a format that allows indexing. A lot of it is similar to providing accessible information, as a bot often ‘sees’ a web page like a screen reader, ignoring the image based map information and concentrating on this links, the url’s and the text of the application, creating an accessible version of the site often creates a SEO and indexing friendly version of the site.
As GIS and spatial systems find increasingly find them used for both commercial and public services, getting them indexed is only going to become more important, how that is done is still much of an art.