Databases vs. Search Engines: The Space Locality Bottleneck

So far, the only solution proposed by database vendors to provide acceptable performance on a large volume of information is to improve the performance of the underlying hardware. In-memory databases like Oracle TimesTen or DB2 SolidDB require huge amounts of physical memory. Datawarehouse appliances like TeraData or Netezza rely on specialised hardware coprocessors. And most recently, as Steve Arnolds points out in his blog, Oracle itself admits that the acquisition of Sun will allow them to build more powerful “systems” by combining Sun’s high-end hardware with Oracle’s database platform.

At Exalead, we believe that Search-Based Applications, or SBAs, are another (I could say more “sustainable”) solution to this problem. The key to efficiently handling large amounts of data is to make sure that data access has a strong “Spatial Locality“. Quoting Wikipedia, achieving spatial locality means that “if a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future.” The main problem with relational databases is that they have very poor spatial locality, because the objects they store are spread across a large number of different tables. High-end CRM or ERP solutions typically store their data on as many as 65,000 different tables, each table being stored at a different disk location. Imagine how many different disk locations the system needs to touch just to display information about a customer or a product on a call center agent’s screen, or to produce a complex BI report. Poor spatial locality leads to huge requirements for disk access, which is the main performance bottleneck for databases today.

SBAs are built on a very different data model, centered around the notion of a “business item”. A “business item” is a self-contained object corresponding to a “real-life” entity that is manipulated by the application and understood by the end-users. For example, in a CRM application, business items would be the Contacts, Opportunities and Leads that are viewed by the business users. Unlike applications built using a relational data model, a business item-centric storage strategy allows for great data spatial locality, since the pieces of information required to answer complex, multi-criteria search queries are all part of a single business item type, and hence stored close to each other on a disk. The performance gap between this local approach and the spread-out relational data model grows exponentially wider as the amount of information applications need to store increases.