Watermarking, WMS and maybe other things beginning with W.

I should caveat this post with the often used phrase, ‘don’t try this at home kids’. When tinkering with the guts of any system and modifying the information being sent to and from a service by ‘hacking’ into the request pipeline of a message your opening up a whole can of performance and stability worms that need a great deal of testing under load to understand the direct effect on the scalability of any site.

A Simple Questionimage

This post is based upon a question I had with a customer at our recent DeveloperHub conference in Birmingham. He asked how it would be possible to watermark an image that had been served from a request to a WMS service. Ed has given an excellent overview about why a customer might want to watermark their images here and some methods to do it. For this query though there was a need to do the watermarking at the server level and not the client, as you don’t want to restrict access to only those systems that have been modified to adopt whatever watermark solution you have adopted.

With WMS you also to need to make sure you don’t force a client process a response that is not compliant with the OGC WMS specification. So you end up with the need to do some invisible modification of the request or response in order to handle the addition of a watermark without any client realising anything has happened.

The WMS Request / Response Cycle

It’s worth taking a brief aside here to look at the WMS request / response cycle. WMS Services can be simply called from a URL as stated in the OGC specification (here) using a HTTP GET. Depending upon the type of operation you are trying to perform the parameters for the URL will vary. In our case we are most interested in the operation that requests maps, unsurprisingly enough the GetMap request. This uses parameters to control the location of the area to return a map of, the layers to be displayed and the format of the image to be returned.

Once the request has been processed the response is in the format of a image is returned as binary to the client. In a web browser that gets placed into an image for display. It’s this binary image that we are able to edit as it goes through to watermark.

Why not burn it in? image

One question that needs to be answered is that of why you don’t just create a tile cache with an image already burnt in. This would be the most ‘performant’ solution as it front loads any of the processing away from the actual request by the user, this increases the response time but leaves a cache that can possibly only be used for one task. Indeed with more than one client requiring more than one type of copyright notice, or image overlay, then each would possibly need their own set of tiles, or own service.

Alternatively, you could have a map service with a layer which contains any of the watermarking details. You can search the WMS request string for the inclusion of this layer, if it’s not there then you can always add it later. This is fine, but it means you are actually messing with the request that’s being made by the client, which could possibly cause for bugs to be introduced into any application making the requests.

A more flexible solution, albeit possibly less performing, would be to handle the addition of any information over the top of the image at a stage of the request where it can be applied post the actual creation of any map. In terms of ArcGIS this would be after the request has come in, been processed by the map service and then return the clear image, as a binary image object.

A Pipeline Solution

The last method was the answer I actually gave at the event. This was to intercept the response to the WMS service and stamp the returned image with the required watermark, either textural or image based. But how to achieve this, the serving of the image from ArcGIS is handled deep within the SOC process which was untouchable, what wasn’t untouchable was the request/response pipeline in the web server, in my case IIS. In the past this might have required the writing of some sort of ISAPI filter to hook into this pipeline, but since .NET came along it became possible to write a HTTPModule to do the same.

The HTTP Module allows you to hook into public events in the request / response pipeline. Specifically the BeginRequest and EndRequest events, which allow you to check the content of a request before it’s forwarded on to ArcGIS Server and process the returned image that is the result from ArcGIS server, before it’s returned to the client. This pipeline can be simply shown in the following diagram.

 image

Bringing it all together

In order to get the application to run, and to be able to debug it (especially in IIS6 which can only work vsprojectwith files processed by the asp.net worker process) you need to create a handler that maps to the arcgis\services directory within your ArcGIS install (see why I say don’t do this at home!). The easy way of doing this is creating a visual studio project within that directory (as you an see in from the VS 2008 project to the right).

Once the solution is in the right place you can update the existing web.config within that directory. It will already contain the ESRI Handler and Module details that are needed for the operation of the ArcGIS server services, by placing an entry for a new module after the existing ones will allow you to hook into the pipeline before and after the ESRI modules (remember this could seriously damage your ArcGIS Server health, use with caution on a test machine before letting it anywhere near production). The entry would be similar to that given below.

wbcnfgOnce we have these elements in place we can add our class with the IHttpModule interface. You can see how to do this in the example at the MSDN site for the creation of a custom HTTP module.

Hooking into the ArcGIS Server Requestimage

In order to perform the watermarking task, its necessary to perform a number of steps, before and after th e request. Where we get involved is in the BeginRequest event handler, this gets fired once a request is made to ArcGIS. In any system it’s good to only do processing of requests when needed, therefore being able to test that a request to ArcGIS Server is for a WMS map is necessary. This can be done by converting the incoming request stream to a string and parsing that, code to be found here.

At this point our watermarking service could perform no end of housekeeping, checking the type of watermark to apply to a specific service or if indeed it is to be applied at all. At this point it also might be good to read any image to be applied to the response and add it into a cache layer (if we do this for lots of images we don’t want any disk access slowing us down more than once). We are now set to let the request filter down the stack to ArcGIS for processing and we can wait for the EndRequest event handler to fire and for use to get down and dirty with the WMS response.

In next week’s episode – Hooking into the ArcGIS Server Response

image It’s at this point I realise that I’ve written another 1000+ words on something that started as a simple question and that having to read much more in one go might cause you to slowly lose the will to live. In order to save you at this point I’ll save the next part, taking the response and applying the watermark till another post. Probably after the ESRI Developers Summit where I know doubt will be shown a better way of doing this.

PS: What no code?

So you might be thinking where my sample is, how you can get access to it. Well, whilst I’ve provided all the tools to write this application, they haven’t been tested especially for use at scale. Modifying the pipeline of the ArcGIS is not to be taken lightly. The amount of work to actually do this isn’t very hard, it’s almost all provided with samples from MSDN and like I did, I would start by reading the custom HTTPModule section on that site and good luck!

Mar 12th, 2010 | Filed under ArcGIS, Architecture

The Lure of Easy

imageThe other day I built a computer almost from scratch. I can admit it, I can nerd it with the best of them when pressed, ok I don’t even need to be pressed. I had a bunch of components lying around, a not too old processor, a bunch of fast RAM and a laptop hard drive all I needed was a case. That was easy to rectify as I’ve always fancied building a little PC and Shuttle do some excellent barebones machines. Now the premise of this post is not the coolness of my new computer (although it is quite nice) but the ease at which it took to build.

When I was Young.

the internet in the 1970's When I was young and the ‘internet was all fields’ I remember building many a machine, both in and out of work, I remember saving my cash for the components, carefully making sure I didn’t bend anything when I slotted processors into motherboards and affixed strange looking fans to the top. I remember screaming when one of the components didn’t work and whole machine failed to boot. I remember returning complete orders and vowing never to build another computer again. But, the lure is too much for something’s and time can heal all wounds, even those inflicted by bad memory modules.

I Haz PowrToolNow whilst I was away from the field of home brew machines a number of things have happened, component prices have reduced, hardware is much more modular and available, I have an electric screwdriver (my only power tool I might add) and I can buy ready small machines with integrated motherboards at every online store. Now what does this add up too? An ability to assemble a machine in under 30 minutes, from start to end. I was shocked, surely it must be harder than this and after a brief moment of screeching from the machine as I had forgotten to plug-in the graphics card power supply, I was up and running installing Ubuntu (it’s free damn you and until I know it’s stable I’m not putting Windows on it!) and hooking it up to the ‘interwebs’.

Now the question arises, why if it’s so easy, would I not recommend building all the machines I own, or use at work? I’d be able to save money and tinker with hardware, what’s not to like?

imageSo easy is good right?

If pushed I could probably build a wall, but would I want it to support my house, probably not until I’d had  a lot of time building walls, maybe not even until 10000 hours to become an expert has passed. It’s the same with my new PC, would I use it to store my families photos, no I use a RAID disk set for that and the cloud (hmm I do trust them right?) as I’m unsure that the machine I threw together would be able to stay working for a long time.  I find this to be the same in designing and developing applications.

Components and development tools and platforms have come a long way since the internet fields were paved over and with that have come rapid prototyping, development and easy deployment. It’s now possible with the use of wizards and samples to throw a demo together in a very short period of time, like the construction of one imageof these modern barebones PC’s. Lots of development is easy, but because you can throw something together it does not mean it will be robust and stable, because I was able to build one machine quickly it doesn’t mean I will have the same luck again, or that my machine, which it’s mismatch components will not let me down when I need it most, like watching Snog, Marry, Avoid on the iPlayer!.

It’s the same with code developed quickly, technical debit will often lead to decisions being made that could impact the delivery of a system down the line, be those due to difficulties in refactoring or failure to run performance tests on software during development. For demo purposes technical debit might not be important, the code might not need to ever see the light of day beyond the demo, although the consequences of showing functionality that might be hard to implement reliably might live to haunt any project in the future. Lobbing technology bombs between pre-sales and professional services is always something that should be avoided, for good profitability reasons.

The Cloud Lure.

The cloud is another case of easy, it sells itself as a way to remove yourself from the burden of machines, your application can scale so long as you have the money to pay for it. Again, like the 30 minute machine build or the quick copy and paste development job, nothing is as easy as it seems and even though the imagelure is there, careful planning still needs to be done in architecting any system especially for those cloud platforms server to emulate a real system. In a world where your application isn’t tied to a specific machine you need to be careful what you can trust, are you getting data from a machine that knows about your updates, or another machine that is just handling your request at that point in time? As your application scales to multiple worker or web processes in an environment like Azure or App Engine, how do make sure everything is tied together?

Understanding how applications run in the cloud will still be needed, in order to utilise existing or still eme
rging patterns of development, such as those in the O’Reilly Cloud Application Architectures book or being developed by Microsoft on their patterns and practices site for Azure. There is no magic going on here, fundamentally thread must be mapped to processors somewhere, hardware has to do some work and then notify other machines about what has gone on. How you handle this in any deployment and its efficiency will impact the performance of any system and solution.

image Deploying applications into the cloud will be as complex as deploying applications into any set of machines, the complexity might be more software focussed and rely less on the understanding of processor specs and more on the understanding of the best practices for writing scalable applications, such as these provided by Google for App Engine.

Easy come Easy Go.

imageWhen I heard David Chappell (the IT speaker and not the comedian) say the phrase ‘there is no lock in like cloud lock in’ I realised that whilst there is much promise of Cloud computing it still needs treated like any other system. Badly written and architected solutions will not magically perform in the cloud and will always cost you more in the end than those that are optimised for performance and tested for scalability.

The cloud allows us to abstract ourselves from some aspects of deployment, but at a cost of making the software we are to deploy possibly more complex. As tooling and patterns become set we will be able to benefit from the power offered to us by a service we can build and deploy within 30 minutes, just don’t bet your mortgage that it will be up in the morning just because it’s in the cloud.

Feb 25th, 2010 | Filed under Architecture, Scalability