A few Google Geo Presentations

imageIt’s been a busy year so far and therefore my posting has been woefully slow. Much have my time has been working with Google Earth Builder. This provides a new way of distributing geospatial information from Google into all sorts of clients, providing a true geospatial platform as a service without the need to worry about how many servers or virtual servers you need to support your clients, allowing GIS experts to worry about geospatial information science rather than geospatial information systems. There will be more information coming out about this platform over the next few months and I hope I can talk to as many people as possible about how they may use this to share their data and collaborate with their clients, removing the complexity, headaches and expense that this causes with current geospatial systems. If you want to know more watch the Where 2.0 video below.

More Google Geo Goodness

Where 2.0 wasn’t the only conference recently, last week at Google IO 2011 there was a veritable plethora of geospatial talks on all of the aspects of Google Geospatial. Two talks stood out for people imagewho might be using and sharing geospatial data, the first about Google Fusion Tables and the second on the surprises of the maps API. The latter presentation gives a good overview of some of the new geospatial functions with the whole Maps API that you might have missed over the last year including. Geospatial at Google is a ever increasing area which touches lots and lots of products, that are easy to use and implement for non-geo experts which increases the reach of geospatial data to more people, this talk is a great intro to all of these.

Fusion tables, GIS for normal people.

Fusion tables continues to add more and more functionality that will allow the geo-prosumer to create and share spatial data in record time. One of the main additions has been that of extra styling functions to the product allowing people to create more engaging maps which in turn help convey the message easier to non-geo people, you can get some more details here. If you wanted to know more about Fusion Tables then this talk is a much watch to show how you can host and map geospatial information from the Google cloud without a single line of code, something I would never have imagined possible when I started doing this GIS malarkey in the early 90’s.

Speed, Speed, Speed

A final geo talk at Google IO that caught my attention was one about how to improve the performance of any mapping application using the Google Maps API. I’ve spent a good portion of my career trying to improve the speed of web applications and especially geospatial web applications. It’s good to be armed with knowledge before you even start any development, this presentation hopefully will help you avoid the pitfalls that people go through when starting developing in this area.

If you combine this with articles from the Google Maps website, such as Too Many Markers, then you can hopefully create speedy maps that are a joy and not a curse to use.

You can find more information and sessions from Google IO 2011 at the website here. If you set the filter to Geo you can see all of the presentations that had a geo flavour. Hopefully I’ll be presenting at the Google Enterprise Geospatial day at the end of august. If your into geospatial and Google it will be like a mini Geo-IO. Hope to see you there.

The Local JavaScript API for Local People

I was giving a presentation at an ESRI gathering hosted in reading at Microsoft this week, talking about all things Silverlight and MapIt, and very little unsurprisingly about the Flex API. During a presentation imageabout the Web API’s I mentioned JavaScript and Silverlight as two offerings that whilst can do similar things, need to be considered carefully in relation to the audience in questioning and the tasks they wish to perform. Now I like the Silverlight API and as I mentioned in the talk it can really allow for the avoidance of browser support pain by abstracting away the layer between your application  and the browser hosting it.

Often though you don’t want a plug-in to preclude people accessing your site, this might be down to a number of reasons but unless your going to start hand cranking simple code for map access (people who have to disable JavaScript within your browser should stop reading now) then you have the JavaScript API. Usually this is delivered to you from somewhere in the cloud (http://serverapi.arcgisonline.com specifically), but often there are a number of scenarios that might occur that will mean you have to install it locally.

Why Local?

If you are an ArcGIS server subscriber you are able to request the JavaScript API from ESRI or your local imagedistributor as explained here on the support site. There are a number of reasons why you might wish to do this:

You are super secure and think that access to the internet is a potential security risk  and therefore getting access to the online API is not possible. Many organisations have yet to embrace the concept of Content Delivery Networks (CDN’s), such as that used for hosting the ESRI JavaScript API. Whilst in a consumer space this is a common architectural practice as it places the download on the users connection for the CDN and not on the hosts (saving bandwidth allowing that to be used for serving images and functionality). Often within the firewall of organisations it is not possible to access external files as the network is secure.  This is also the case where you have a slow external connection and you wish to limit the amount of information that is downloaded from the ‘cloud’.

imageYou want to create a custom build of the API to reduce the number of files downloaded, improving browser response. The classic number one rule from High Performance Web Sites of making fewer HTTP  requests asks for front end developers to minimise the number of files they download from a site, when building on top of the JavaScript API it is common to build your application into many separate files to separate functionality or Dijits and to allow the usual spaghetti of development that can happen with JavaScript to be controlled.

A concurrent dilemma

The problem with this is that each JavaScript library you add to the application the longer it will take to download due to the limited number of concurrent files a browser can download at once. This is the same if you include new CSS files and images as well. This is down to the way HTTP 1.1 was implemented and setting a limit of no more than two files concurrently per hostname. BrowserScope explains it nicely as:

When HTTP/1.1 was introduced with persistent connections enabled by default, the suggestion was that browsers open only two connections per hostname. Pages that had 10 or 20 resources served from a single hostname loaded slowly because the resources were downloaded two-at-a-time. Browsers have been increasing the number of connections opened per hostname, for example, IE went from 2 in IE7 to 6 in IE8.

image This often leads to developers hosting parts of their application on different hostnames one starting with a URL such as images.site.com and another such as scripts.site.com, to get around such limitations. With large JavaScript libraries the time it takes to download these on slower connections can leave an application imageunresponsive and give a very bad user experience.

You can see from their tests (see image left) that modern browsers have upped this limit to 8 connections, this can have a dramatic affect on perceived speed as more of the application can be downloaded at once and when partitioned correctly (using a lazy loading pattern preferably) then users can see something happening in the browser much earlier than they could with a plug-in such as Silverlight or Flash (beware the loading circle of doom).

Of course with external (public facing) applications you will always have some users using IE6 and the limitations of 2 connections per hostname. This is where the ability to package your JavaScript files up into as small package as possible and as few packages as possible becomes more important and the need for a copy of the local JavaScript API imperative for use in a custom Dojo build.image

One build for all?

Unfortunately each build is different for each project and therefore there is no magic bullet for this. Fortunately there are a number of key pieces of information and discussions about how to perform this task. There is a thread on the ESRI support forums going through the process an excellent Geocortex post upon why they did a custom build here and finally a link to the Dojo documentation outlining the process in full. Whilst it might be a complex procedure, for people who are interesting in getting their application to load in the fastest time within the browser, it should be a procedure worth looking at.

Do you Cache?

Before you read on this isn’t a post devoted to image caching. This is a post about data caching in general imagewith image caching being an extreme form of data caching. It comes from a bit of work I did recently caching data from a tracking feed. It’s based around why you want to cache, what data you might need to cache and how you might cache (I used .NET but you can do it in all major web development languages). Caching has often been the premise of web sites that want to be, and I’m using a technical terms here, ‘screamingly fast’ and not ‘snail slow’.

Caching before caching was famous.

For many people developing with ArcGIS server caching means one thing, map caching. This has imagebecome the panacea for many scalability issues with applications. As if the maps are going to be the only slow part of your application. Whilst removing the bottlenecks of pretty maps away from your site is very important it often can form one part of your data caching strategy. There are often other areas that you might look at when deciding what to cache and when.

In web applications caching can occur in at a number of levels, specifically at the data level (caching data to avoid making costly requests) or the page output level (caching the output of web pages so they don’t have to be built in real time every time). Combining these areas can help an application scale but there are a number of considerations you should take into consideration before implementing any caching into any system.

Often with spatial systems, especially in the enterprise (within the firewall) the need for data to be current often outweighs the extra benefits that might be gained through getting a few more users onto the system. The difficulty is understanding with any application is how much it will scale and whether imagebuilding this stuff into the project is worth it, the problem is that usually when you realise that it might be useful it’s too late to refactor your application and you’re stuck in a world of application fail. Now it’s easy to crow at the big internet applications when you see they are struggling with the amount of users on the system, most of us though will never have the same architectural issues that plague Twitter, Flickr or Facebook, if you do I suggest having a read of the Art of Capacity Planning to understand how to monitor the load you’re going to be under and to mitigate appropriately!

For the rest of us it shouldn’t mean that the use of caching won’t help our applications, even under modest scale and if you’re a .NET developer there are easy ways to implement it within the system, either at the code level (for data caching) or the server level (for output caching).

So what does to cache mean?

The process of caching is basically the storage of the output of a request or set of processing; so that once it has been done it may be reused for a number of people. The basic premise being that the output imageof the work is valid for a number of users and any changes to the underlying data will not impact the usage of the information within the time period that the cache is valid. This means that you need to understand the nature of the data and how it is used within any workflow. Failure to do so can mean your users potentially seeing data that is temporally incorrect, something that wouldn’t possibly be out of place in a episode of Doctor Who. So the two main questions with caching are the concepts of what to cache and how long the data can be validly stored within the cache.

What to cache?

The question of what to cache is always going to be a tricky one that will be down to the application that imageis being developed, down to the need for currency of the data being used. If you think that data can be delayed 15, 30 or 60 minutes, or even longer, without affecting the users operation of the site then it’s probably ready for caching. This could save endless requests for the same data that in turn reduces the processing cycles on your servers, reducing the number of machines you need to use thereby reducing the cost of the implementation. Even in this world of unlimited cloud machines, there is still a cost associated with every processing cycle and every saved resource is saved money, again good in these times of thrift.

In terms of exactly what to cache these three areas given by Microsoft in their enterprise caching block information are a good start:

  • You must repeatedly access static data or data that rarely changes.
  • Data access is expensive in terms of creation, access, or transportation.
  • Data must always be available, even when the source, such as a server, is not available.
  • imageIn order to make your site as robust as possible you need to take a pessimistic view of software and servers (no comments please) and admit that at some time they will fail. If you want to protect yourself from such failure using a cache to store data and to make sure it never expires if you don’t get new information will allow your site to give the impression that it’s bullet-proof even though things are blowing up at the back end!

    How to cache?

    Microsoft make it amazingly straight forward to implement data caching, although as I said deciding what to cache and for
    how long is another matter. On the web there are ways of adding to the data cache of an application using the HttpRuntime.Cache property that can be used to get access to the applications cache. The report manager example given on this MSDN page gives a starting point for implementing a class that can control access to a piece of cached information.

    There are a number of choices that need to be made whilst using the cache object. Especially around the timing of objects, an absolute or sliding scale can be used to control how long an item stays in the cache. Long running static objects that don’t change can be kept alive using the sliding scale, as long as they are accessed within a time period they stay alive. Objects that need to be updated periodically can be made to expire at a certain time so as to make sure they are current to users but also to not require the server to process to much or request data from remote services too often. Further information about caching and ASP.NET can be found here.

    Timing the Updates

    Have a look at this picture. You can see the red line, which is the route the person is riding. You can see imagethe person, WTF? I hear you say… well maybe not. But it shows a good point about getting the timing right with any bit of cached data. Here we have two cached pieces of information. The position and the route. We get both in WGS 84 and want to project them into Web Mercator to lie over our map tiles.

    The point request can be performed very quickly but might not get updated very often; the route get’s updated even less but can take longer to project. We can see that if we update the position every 15 minutes, which is quiet timely enough when tracking someone on a bike, and the route every hour say, which again should be fine, then the application should be able to scale quiet well as the amount of requests we are actually making is quiet low. Why the problem?

    Well obviously there may be a point whereby the position is not updated in the cache between the time that the the route is updated and the position is updated. This might be because the feed you are using image(if it’s not your own) in turn is updated at different intervals. In the picture (at the start) we have the rider’s position slightly to the left of the end of the route as the caches are slightly out of sync. Five minutes later and the rider is back where they should be (and the corresponding rift in the space time continuum is closed, phew!), as shown in the picture to the right.

    This raises an interesting point about how to time your cache. Do you have one cache for everything; do you cache various pieces of data at different rates depending upon how often the information get’s updated?

    Do I need to care for my application?

    As with any architectural decision within your application the need to use caching or not should be down to the actual amount of users you are going to get and the complexity of service calls you’re going to make. imageOften though it’s the unknown unknowns that with affect the application over its lifecycle that will make the difference, will someone decide to roll your nice heavy weight intranet application out over the internet (it’s happened I’ve seen it, it wasn’t pretty)? Will your obscure application suddenly be tweeted about by Stephen Fry (he loves to watch websites die!)?

    There are many issues whereby having an understanding of the use of caching might be important, and in fact site saving! With the ease of implementation with today’s tools then there is little reason, apart from ignorance (usually mine…), whilst a certain level of caching can’t be built into even the most innocuous application.

    Cycling to the Ashes

    For me getting back into caching was all about developing a simple website for a charity bike ride for the ashes. Anyway whilst it might seem a different world from what we usually do at ESRI(UK) the actual scalability challenges that might be found in a consumer application that might be mentioned on a national radio or television show (or even by Mr Fry!) are the same in a large enterprise application and if designed correctly should be able to scale appropriately.

    If you’re interested the application can be seen here at CyclingToTheAshes or as a bigger version here MapsPerSecond. Whilst it’s a simple application, and for those who care it was built using ASP.NET and the ESRI JavaScript API using OpenStreetMap data, the challenges of scalability and robustness needed to be met due to the nature of how the site was being marketed. There were also long expensive calls to the REST services providing the location (from Sanoodi) and project the data (ArcGIS Server), both of which needed to be reduced where possible through caching!

    Anyway if you have read this far and are still awake (once again this post wasn’t meant to be so long!), please hop over to Oli’s site and go sponsor him, it’s a worthy cause.

    image

    A rule of thumb.

    There has been a long standing rule of thumb when deciding how many instances to give a map service to give optimal performance. Finding this information has sometimes been hard although surprisingly when asked for this information the other day, and failing to find it, I decided to see if it was on the new resource centre. Fortunately the is a page on services performance.image

    http://resources.esri.com/enterprisegis/index.cfm?fa=performance.app.services

    Here it not only gives the ‘rule of thumb’ for the number of instances for a map service (2.5 * #CPUs) but also a whole series of information about the relative performance of each service type and the factors that will specifically effect the performance of any map service.

    With 9.3.1 it becomes a bit easier to automatically determine why a service might be slow either through using the new .MSD service type and the map services publishing toolbar or using the old school mxdperfstat script.

    The Perils of Synthetic Tools

    Of course any synthetic tool will only give you a level of guidance, any proof would have be got through actually performance testing any solution during development, preferably as early as possible. Such tests and examples are given in two recent ESRI whitepapers, the High-Capacity Map Services: A Use Case with CORINE Land-Cover Data and Best Practices for Creating an ArcGIS Server Web Mapping Application for Municipal/Local Government.

    Both documents cover the optimum use of data and their effect on how an application performs. The former in terms of a high scalability site but with information that can be applied to all sites, especially in terms of the recommendations about using file geodatabases for large performance gains. The latter document is important as it shows how a workflow can be mapped to making choices in implementation of an ArcGIS Server architecture, map and geoprocessing services for a medium sized authority.

    A Good Guide

    Guidance like that available in these two documents and on the Enterprise Resource Centre in general, whilst not indicative of how every site will perform, gives a good grounding in the pitfalls to avoid when translating user requirements to any specific solution architecture. With any performance and architecture though its important that you think of not only the performance now but also the performance implications of any site growing over time. Without any analysis of the capacity requirements of your site, you really don’t know how long your current performance will be applicable. It should be remembered though as is said so eloquently on this Ted Dziuba’s site ‘unless you know what you need to scale to, you can’t even begin to talk about scalability.’

    Understanding your current performance requirements and your short to medium term load requirements, and potential spike points, will mean that you can concentrate on worrying about the right parts of your application in terms of performance and stop worrying about those areas that might never become a problem. The book ‘The Art of Capacity Planning’ gives a good overview about how to tackle monitoring your sites performance over time, what to worry about, and when.

    A starter for ten.

    It seems in this increasingly twitter fuelled world that anyone starting a blog must be certified. Surely the world can’t read more than 140 characters anymore, why bother making them? Well always one to buck the trend, I thought it might be a good time to start a blog.

    Hmm a blog, what should it be on? ArcGIS, nope plenty of those, Programming, nope loads of those also. GeoNerdRage? Nope I can name a few of those also 🙂

    So in order to start this blog I decided on the theme of performance and scalability, the architecture of such and the technologies that can help people design and develop solutions for performance and analyze the problems when things go wrong.

    The need for performance

    To me this is an increasingly important topic as a lot of spatial systems beyond simple mapping sites to solutions that integrate into key business applications within an organisation. Microsoft, SAP and Oracle all provide the big systems that power enterprises and GI is often stored within them or integrated alongside them to provide information the organisations use to make business decisions. In the past it has often just been a challenge to get the systems to work together at all, but with the move towards SOA and recently REST based services, it is increasingly easy to call systems and tie services together to improve the decision making process. In order for this process to be seamless they system also must be quick and perform under load and continue to perform even when the system grows with greater capacity.

    The quandary is that as systems become more complex and solutions integrate together the design choices become increasingly important. The solutions we could get away with in the past with simple mapping applications become more challenging when integrating SOAP services with SAP, or scaling an editing application up to hundreds of concurrent web users. It can be done and that is where a constant approach for working performance into any design and monitoring the performance of any system at a variety of levels throughout the development process and into deployment stages.

    How you do that and how you monitor it I’ll leave to future posts but much of the process can be found at a high level in the following documents:

    Performance-Driven Software Development – Carey Schwaber (Forrester report can be found elsewhere on the internet for free if you register)

    And the excellent

    Performance Testing Guidance for Web Applications – Microsoft Patterns and Practices

    Enjoy the ride…..