
YOUR MISSION, SHOULD YOU CHOOSE TO ACCEPT IT…
In January of 2009, Organic’s largest client came to us with an ambitious challenge: the creation of the largest repository of customer education content in their portfolio. This project, code-named “FWB,” would mark the first time that the Bank had created a single application to handle all relevant, consumer-facing information.
THE CATCH
This repository would be highly visible, accessed by many high-traffic online properties, so it would need to guarantee high performance under heavy load. Our clients wanted to place the editing and publishing of content in the hands of marketing and business stakeholders, with the technical requirement of having each piece of content live as a modular, reusable unit, that could be cross-linked and associated with other content units. In addition, the technology stack wasn’t finalized and we were going to be working with an as-yet-unknown CMS, on an as-yet-undefined network topology with no DBA, all of which were slated for hosting with an entirely new solutions provider.
BUT OTHER THAN THAT, MRS. LINCOLN, HOW WAS THE PLAY?
Traditionally, this sort of project would be tackled with a massive server farm, running a SQL-driven RDBMS to handle data and a standard servlet engine consuming JOINed data, which would be munged together using loops and string concatenation in the view tier. But this wouldn’t work for FWB — our requirements for n+1 scalability killed the farm, document-centric content killed RDBMS, and our need to surface content in multiple formats (JSON, XHTML, XML, RSS, and potentially others) killed the traditional servlet approach. Six weeks before site launch, with only two developers on board, we were left with a lot of loose strands and dead ends.
AND THE PORTIONS WERE SO SMALL
Surveying our rejected options, we noticed a characteristic shared by all of them: rigidity. Traditional SQL locks developers into a static set of schemata, and all content — even structured articles — must be sliced into rows and normalized among fixed tables. Sequential information must be added explicitly in order to preserve the order of elements, and the simple process of reassembling an article from these atoms (especially a versioned one) would necessarily involve complex queries. In addition there are inefficiencies involved with this amount of data sifting. As a result the role of a DBA becoming crucial : creating indices, optimizing queries, and writing stored procedures are the only path to high performance.
SAY YES TO NO
The real problem with the rigidity of a SQL RDBMS is the tyranny of the schema. What if there was a way to store content without relying on a rigid schema? Fortunately, the world of SQL and RDBMS has been rocked by the emergence of the “NoSQL” movement just in time. NoSQL has become a bit of a catchall phrase, but the concept behind it is simply the removal of traditional SQL-centric strictures from data that does not require tSDFGHJKML
In our case, we were dealing with structured textual data coming from and edited by the client, so removing the complexity of disassembling the content into traditional database tables significantly simplified the architecture. After a rapid evaluation of current technology options, Berkeley DB XML was the tool we settled upon; it sported the pedigree of Oracle but ran in a tight footprint with minimal upkeep, and worked with pure XML.
THE GIFT THAT KEEPS ON GIVING
We found ourselves reaping dividends right away. As the client’s CMS specifications materialized around Autonomy Interwoven TeamSite, we were able to set up a VM environment for content entry and then output XML immediately — with no need to ensure that a production-ready installation would be ready before launch with enough lead time for “copy engineering.”
MANY HANDS MAKE LIGHT WORK
Our XML database approach meant that Xpath/Xquery could remain the lingua franca of our web tier, but more importantly, it meant that XSLT was now added to our development toolkit. When the time finally came, integration was seamless. Each faction of our development team was able to work concurrently around a common XML schema. Additionally, there was no need for development team members to keep local development databases. Since that schema was the integration point itself, each team was able to work either toward it, or from a mocked version of it. Of course, DBXML isn’t just a flat store of XML files, and we were never forced to relinquish the power of relational access between articles and passages of structured text.
THAT’S NO MOON
One hidden benefit to many NoSQL technologies (not just DBXML but others, including Couch and MongoDB) is a vast potential for scalability. For instance, in the case of DBXML, it’s an embedded library that runs within the footprint of the application itself. So, as you scale your application, you inherently scale your XML database. As the success of this project has led to rapid expansion, and our clients have already begun formulating far more ambitious plans for the future, we’ve begun to explore other NoSQL technologies, such as key-value, and microdocument databases.
Our NoSQL architecture allows us to scale horizontally across web servers without the complication of scaling database servers, DBA’s, or processes. The XML transforms allows less Java-savvy developers to find and access the content in an intuitive way, and the reliance on XML for data access allows reuse across front end clients (web browsers, mobile, tablets, RSS, …). Ultimately, the light weight nature of this content delivery architecture means we can be much more responsive to client and creative changes, right up to the last minute.
Authors: Bill McDermott, Eric Mittelhammer, & Peter Balogh
