A few Google Geo Presentations

imageIt’s been a busy year so far and therefore my posting has been woefully slow. Much have my time has been working with Google Earth Builder. This provides a new way of distributing geospatial information from Google into all sorts of clients, providing a true geospatial platform as a service without the need to worry about how many servers or virtual servers you need to support your clients, allowing GIS experts to worry about geospatial information science rather than geospatial information systems. There will be more information coming out about this platform over the next few months and I hope I can talk to as many people as possible about how they may use this to share their data and collaborate with their clients, removing the complexity, headaches and expense that this causes with current geospatial systems. If you want to know more watch the Where 2.0 video below.

More Google Geo Goodness

Where 2.0 wasn’t the only conference recently, last week at Google IO 2011 there was a veritable plethora of geospatial talks on all of the aspects of Google Geospatial. Two talks stood out for people imagewho might be using and sharing geospatial data, the first about Google Fusion Tables and the second on the surprises of the maps API. The latter presentation gives a good overview of some of the new geospatial functions with the whole Maps API that you might have missed over the last year including. Geospatial at Google is a ever increasing area which touches lots and lots of products, that are easy to use and implement for non-geo experts which increases the reach of geospatial data to more people, this talk is a great intro to all of these.

Fusion tables, GIS for normal people.

Fusion tables continues to add more and more functionality that will allow the geo-prosumer to create and share spatial data in record time. One of the main additions has been that of extra styling functions to the product allowing people to create more engaging maps which in turn help convey the message easier to non-geo people, you can get some more details here. If you wanted to know more about Fusion Tables then this talk is a much watch to show how you can host and map geospatial information from the Google cloud without a single line of code, something I would never have imagined possible when I started doing this GIS malarkey in the early 90’s.

Speed, Speed, Speed

A final geo talk at Google IO that caught my attention was one about how to improve the performance of any mapping application using the Google Maps API. I’ve spent a good portion of my career trying to improve the speed of web applications and especially geospatial web applications. It’s good to be armed with knowledge before you even start any development, this presentation hopefully will help you avoid the pitfalls that people go through when starting developing in this area.

If you combine this with articles from the Google Maps website, such as Too Many Markers, then you can hopefully create speedy maps that are a joy and not a curse to use.

You can find more information and sessions from Google IO 2011 at the website here. If you set the filter to Geo you can see all of the presentations that had a geo flavour. Hopefully I’ll be presenting at the Google Enterprise Geospatial day at the end of august. If your into geospatial and Google it will be like a mini Geo-IO. Hope to see you there.

The Local JavaScript API for Local People

I was giving a presentation at an ESRI gathering hosted in reading at Microsoft this week, talking about all things Silverlight and MapIt, and very little unsurprisingly about the Flex API. During a presentation imageabout the Web API’s I mentioned JavaScript and Silverlight as two offerings that whilst can do similar things, need to be considered carefully in relation to the audience in questioning and the tasks they wish to perform. Now I like the Silverlight API and as I mentioned in the talk it can really allow for the avoidance of browser support pain by abstracting away the layer between your application  and the browser hosting it.

Often though you don’t want a plug-in to preclude people accessing your site, this might be down to a number of reasons but unless your going to start hand cranking simple code for map access (people who have to disable JavaScript within your browser should stop reading now) then you have the JavaScript API. Usually this is delivered to you from somewhere in the cloud (http://serverapi.arcgisonline.com specifically), but often there are a number of scenarios that might occur that will mean you have to install it locally.

Why Local?

If you are an ArcGIS server subscriber you are able to request the JavaScript API from ESRI or your local imagedistributor as explained here on the support site. There are a number of reasons why you might wish to do this:

You are super secure and think that access to the internet is a potential security risk  and therefore getting access to the online API is not possible. Many organisations have yet to embrace the concept of Content Delivery Networks (CDN’s), such as that used for hosting the ESRI JavaScript API. Whilst in a consumer space this is a common architectural practice as it places the download on the users connection for the CDN and not on the hosts (saving bandwidth allowing that to be used for serving images and functionality). Often within the firewall of organisations it is not possible to access external files as the network is secure.  This is also the case where you have a slow external connection and you wish to limit the amount of information that is downloaded from the ‘cloud’.

imageYou want to create a custom build of the API to reduce the number of files downloaded, improving browser response. The classic number one rule from High Performance Web Sites of making fewer HTTP  requests asks for front end developers to minimise the number of files they download from a site, when building on top of the JavaScript API it is common to build your application into many separate files to separate functionality or Dijits and to allow the usual spaghetti of development that can happen with JavaScript to be controlled.

A concurrent dilemma

The problem with this is that each JavaScript library you add to the application the longer it will take to download due to the limited number of concurrent files a browser can download at once. This is the same if you include new CSS files and images as well. This is down to the way HTTP 1.1 was implemented and setting a limit of no more than two files concurrently per hostname. BrowserScope explains it nicely as:

When HTTP/1.1 was introduced with persistent connections enabled by default, the suggestion was that browsers open only two connections per hostname. Pages that had 10 or 20 resources served from a single hostname loaded slowly because the resources were downloaded two-at-a-time. Browsers have been increasing the number of connections opened per hostname, for example, IE went from 2 in IE7 to 6 in IE8.

image This often leads to developers hosting parts of their application on different hostnames one starting with a URL such as images.site.com and another such as scripts.site.com, to get around such limitations. With large JavaScript libraries the time it takes to download these on slower connections can leave an application imageunresponsive and give a very bad user experience.

You can see from their tests (see image left) that modern browsers have upped this limit to 8 connections, this can have a dramatic affect on perceived speed as more of the application can be downloaded at once and when partitioned correctly (using a lazy loading pattern preferably) then users can see something happening in the browser much earlier than they could with a plug-in such as Silverlight or Flash (beware the loading circle of doom).

Of course with external (public facing) applications you will always have some users using IE6 and the limitations of 2 connections per hostname. This is where the ability to package your JavaScript files up into as small package as possible and as few packages as possible becomes more important and the need for a copy of the local JavaScript API imperative for use in a custom Dojo build.image

One build for all?

Unfortunately each build is different for each project and therefore there is no magic bullet for this. Fortunately there are a number of key pieces of information and discussions about how to perform this task. There is a thread on the ESRI support forums going through the process an excellent Geocortex post upon why they did a custom build here and finally a link to the Dojo documentation outlining the process in full. Whilst it might be a complex procedure, for people who are interesting in getting their application to load in the fastest time within the browser, it should be a procedure worth looking at.

A rule of thumb.

There has been a long standing rule of thumb when deciding how many instances to give a map service to give optimal performance. Finding this information has sometimes been hard although surprisingly when asked for this information the other day, and failing to find it, I decided to see if it was on the new resource centre. Fortunately the is a page on services performance.image

http://resources.esri.com/enterprisegis/index.cfm?fa=performance.app.services

Here it not only gives the ‘rule of thumb’ for the number of instances for a map service (2.5 * #CPUs) but also a whole series of information about the relative performance of each service type and the factors that will specifically effect the performance of any map service.

With 9.3.1 it becomes a bit easier to automatically determine why a service might be slow either through using the new .MSD service type and the map services publishing toolbar or using the old school mxdperfstat script.

The Perils of Synthetic Tools

Of course any synthetic tool will only give you a level of guidance, any proof would have be got through actually performance testing any solution during development, preferably as early as possible. Such tests and examples are given in two recent ESRI whitepapers, the High-Capacity Map Services: A Use Case with CORINE Land-Cover Data and Best Practices for Creating an ArcGIS Server Web Mapping Application for Municipal/Local Government.

Both documents cover the optimum use of data and their effect on how an application performs. The former in terms of a high scalability site but with information that can be applied to all sites, especially in terms of the recommendations about using file geodatabases for large performance gains. The latter document is important as it shows how a workflow can be mapped to making choices in implementation of an ArcGIS Server architecture, map and geoprocessing services for a medium sized authority.

A Good Guide

Guidance like that available in these two documents and on the Enterprise Resource Centre in general, whilst not indicative of how every site will perform, gives a good grounding in the pitfalls to avoid when translating user requirements to any specific solution architecture. With any performance and architecture though its important that you think of not only the performance now but also the performance implications of any site growing over time. Without any analysis of the capacity requirements of your site, you really don’t know how long your current performance will be applicable. It should be remembered though as is said so eloquently on this Ted Dziuba’s site ‘unless you know what you need to scale to, you can’t even begin to talk about scalability.’

Understanding your current performance requirements and your short to medium term load requirements, and potential spike points, will mean that you can concentrate on worrying about the right parts of your application in terms of performance and stop worrying about those areas that might never become a problem. The book ‘The Art of Capacity Planning’ gives a good overview about how to tackle monitoring your sites performance over time, what to worry about, and when.

97 Things

97 Things CoverI’ve been reading 97 Things Every Software Architect Should Know on and off for the last few months. Not only does the book come with it’s own website where you can make comments on the pearls of wisdom within it’s pages. It also helped me understand that the problems I’ve faced in Architecture over the last year are very common and that architecture whilst not exactly an art will never be a science either. In terms of the content there are three main items that stand out for me within the book.

1) Don’t put your resume ahead of the requirements.

There is sometimes a relentless pressure towards the new with technology. The lure of ‘shiny shiny’ is sometimes too much for both developers and architects and decisions about what to use is often made based upon what is cool rather than what is needed.

Now often new technologies are there to plug known problems or gaps with existing technologies (ArcGIS caching technologies is a good example of a new technology which was introduced for scalability reasons). It is often the case though that there is no good reason to choose a new method over an old method beyond the need to try something new

When evaluating new technologies especially cutting edge ones, you should always be checking out whether it really is the correct solution for your project, battling with new poorly understood and documented technologies might cause more problems than they solve in the long run.

If though a project is going to run a number of years, or have a lifespan which doesn’t include a technical refresh within 5 years or so, then it is always good to evaluate the new as they might still be supported into the future and most issues can be assumed to be quashed in the meantime.

2) It’s never too early to think about performance.

I love this one. When developing software even in a continuous integration environment, it’s all about functional requirements. It’s all about does it do this, does it break when I do that. Often though when all said and done, user acceptance is often done on how the application feels. How long does is take to load? Does the UI look right?

These non-functional requirements are often poorly defined to be tested against. The result, major overruns in projects when the delivered application fails performance or scalability testing during user acceptance. The solution, test performance as near to the start of a project as possible, in order to determine whether the new geoprocessing task you have added to the application is quick enough under load or brings the whole server farm to a grinding halt.

There are a number of ways of doing this, but as there is test driven development to catch bugs, there can be performance driven development to catch performance or architectural issues that might cause problems down the line when the system is delivered. This becomes increasingly important in a system that might be integrated into an enterprise workflow or service bus, where the nature of the performance issue might cause other services to be diminished.

I can never stress to people the importance of performance testing in any project, I can also never stress to people the way that performance testing can become a fixation. There lies a future post I’m afraid!

3) Chances are your biggest problem isn’t technical.

As we know there are many ways to skin a cat (sorry to those cat lovers out there). There are also many ways to deliver technical solutions, there are so many technologies and architectures out there to solve all sorts of problems that there is usually no excuse to not overcoming a technical hitch. What you can’t do is solve all of the non-technical problems as easily.

I mentioned in the last section about non-functional requirements. These are usually the key points that can make or break a solution and are thought up by real people! Sometimes these might be seen as impossible more likely they are seen as not fully defined to deliver too. In the main, all sides of any solution want it to succeed, at the minimum cost in both terms of money and time. If this understanding is the start of every conversation then any of the non-technical or people problems, ‘should’ be easier to solve.

Remember if you see a requirement that says the system ‘should have pretty maps’ run for the hills.

The book has many more great points, many of which I’ve seen happen on projects or will now be especially aware of!

A starter for ten.

It seems in this increasingly twitter fuelled world that anyone starting a blog must be certified. Surely the world can’t read more than 140 characters anymore, why bother making them? Well always one to buck the trend, I thought it might be a good time to start a blog.

Hmm a blog, what should it be on? ArcGIS, nope plenty of those, Programming, nope loads of those also. GeoNerdRage? Nope I can name a few of those also 🙂

So in order to start this blog I decided on the theme of performance and scalability, the architecture of such and the technologies that can help people design and develop solutions for performance and analyze the problems when things go wrong.

The need for performance

To me this is an increasingly important topic as a lot of spatial systems beyond simple mapping sites to solutions that integrate into key business applications within an organisation. Microsoft, SAP and Oracle all provide the big systems that power enterprises and GI is often stored within them or integrated alongside them to provide information the organisations use to make business decisions. In the past it has often just been a challenge to get the systems to work together at all, but with the move towards SOA and recently REST based services, it is increasingly easy to call systems and tie services together to improve the decision making process. In order for this process to be seamless they system also must be quick and perform under load and continue to perform even when the system grows with greater capacity.

The quandary is that as systems become more complex and solutions integrate together the design choices become increasingly important. The solutions we could get away with in the past with simple mapping applications become more challenging when integrating SOAP services with SAP, or scaling an editing application up to hundreds of concurrent web users. It can be done and that is where a constant approach for working performance into any design and monitoring the performance of any system at a variety of levels throughout the development process and into deployment stages.

How you do that and how you monitor it I’ll leave to future posts but much of the process can be found at a high level in the following documents:

Performance-Driven Software Development – Carey Schwaber (Forrester report can be found elsewhere on the internet for free if you register)

And the excellent

Performance Testing Guidance for Web Applications – Microsoft Patterns and Practices

Enjoy the ride…..