Big Data & Search-Based Applications

Sometimes we find that things that seem so obvious they belong in the “goes-without-saying” camp turn out to be obvious only in our own heads, or a handful of like-minded heads. Such is the case for me with search-based applications and Big Data.

In 2010, as I was writing the book Search-Based Applications: At the Confluence of Search & Database Technologies (2011, Morgan & Claypool) with my colleague Gregory Grefenstette, the mainstreaming of ‘Big Data’ was already well underway. It had broken out of rarefied scientific and academic circles to become a subject of vivid interest to mainstream journalists, industry analysts, venture capitalists and M&A teams.

There was no need (so I thought) to edit the manuscript to incorporate an explicit discussion of Big Data. I expected search-based applications (SBAs) to surface naturally in Big Data discussions as a prototypical strategy for tapping into the potential of Big Data. After all, Exalead clients usually turned to SBAs when 1) they hit a performance, scalability or usability roadblock with their existing systems (usually, classic relational database management systems), or 2) when they realized an SBA would let them exploit untapped data sources (for example, machine data or Web content) that was previously simply too voluminous, variable and/or fast-moving to be handled technically or economically in any other way.

What’s more, search engine strategies and technologies have always been at the heart of almost all of the Big Data-enabling technologies dominating the headlines. In 2008, when the seminal Big Data Computing Study Group set out to answer the question: “How can the capability for computing over large data sets be provided in a way that is cost effective, reliable, and generally usable?” they naturally turned to the world of search: “Toward this end, we draw inspiration from the computing systems that have been developed at search engine companies.” (See

However, in spite of this common DNA and the native capacity of SBAs to make Big Data accessible and meaningful, SBAs have remained largely invisible in Big Data discussions. Search-related technologies are mentioned, but usually in an oddly discrete and disconnected fashion, as though text mining, entity extraction, clustering, inverted indexes, machine learning, etc., etc., sprouted overnight in response to Big Data.

To help shed light on the role of search and search-based applications in the domain of Big Data, I’ve produced a white paper that covers Search, NoSQL and SQL solutions: A Practical Guide to Big Data: Oportunities, Challenges & Tools, and I’ll be taking on various topics related to search/SBAs & Big Data in this blog series. I also recommend keeping tabs on the work of IDC Analyst Sue Feldman, who’ll be covering a number of topics at the intersection of search and discovery technologies and Big Data this year. Also on the watch list, 451 Group’s Matt Aslett, who covers NoSQL and NewSQL among other topics, and will be adding the search domain to his focus areas as well. Stay tuned, should be interesting an interesting year for Big Data & Search!

Next up on the agenda: the crossroads of Web Search and NoSQL, and a conversation with the Web Search & Analytics team in Exalead’s R&D division.