SOA and GWT/HTML Scraping

As promised, I continue my previous posts on the Message Enrichment scenario. I met with the customer, and it seems, that the only way to truly enrich their ESB message is to build an image processing engine right inside the ESB. Although it sounds quite cool – and I’d love to do it myself, since it’s been years since I wrote some image processing code (last time was on Turbo Pascal for Windows 1.5 – 😉 ) – it seems like a waste to put inside the ESB.

The solution? Use an HTML scraping as an ESB connector, to rip the data from the web browser, and use it as a service. The reason – the application wrote some code inside the engine, and some code (like zoom-in/zoom-out) inside the web browser (using GWT). And so a command will be recieved, it will be translated to an HTTP GET request, and the resulting HTML will be scrapped, to get the actual required image.
I like this idea, and not (only) becuase it’s mine. I think it makes good reuse of existing code, which is what service exposure is all about, at least for me. The shocking part was that no one I talked to even considers HTML scraping as a legitemetasdf SOA concept. Ain’t that odd? I can’t count the times I had to pull my HTML scraping toolkit (XQuery and JTidy are fine by me, thanks) and rip information from existing HTML pages.
However, there is a chanllenge here. Is it even possible to scrap GWT based pages?

One thought on “SOA and GWT/HTML Scraping

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s