Imagining a New Generation of Big Data Applications

In my last post on the Web Search & Analytics division in Exalead R&D, I shared some highlights of my conversation with Jim Ferenczi, who leads the division’s Web Data Mining team. Today, I’ll replay some notes from my talk with Rémi Landais, head of the division’s Innovative User Experience team. His mission, as daunting as it is enticing, is literally to “invent and discover next-generation applications of Big Data.”

It’s hefty charge, but one to which Rémi is well accustomed. He has spent the last several years collaborating with Exalead’s technical and academic partners as well as Exalead colleagues and customers to imagine, develop and test a whole host of multimedia access and analytic applications. These include Chromatik, which lets users search and navigate images based on color; Voxalead, which leverages speech-to-text transcription technology to enable search inside videos; and MuMa, a music search engine that, among other capabilities, automatically classifies songs by moods (sad, romantic, happy, etc.) using acoustic analysis.

His new role in the Web Search & Analytics division is therefore the same in spirit, though the data universe he’s taking on is greatly expanded, with multimedia content being joined by unstructured, semi-structured and structured Big Data of all flavors, both inside the enterprise and out on the Internet.

Like Jim, he has to maintain a wide field of vision, pushing the frontiers of technology and advancing the core CloudView platform while maintaining a close intimacy with the changing ways in which users consume, generate and interact with information.

So what’s on his list of current projects? “One of the things I’m doing right now is working with scientific organizations, government agencies and start-ups to explore innovative, high-value uses of Open Data.” (For those not familiar with Open Data, it’s a movement dedicated to making the widest possible range of data – to date mainly scientific and governmental data – freely available to the public through the Internet.)

For Rémi, the search-based application (SBA) model is tailor-made for tapping into the potential of Open Data: “As nice as it’d be if every data set was published using standard formats like RDF/XML, the reality is organizations are pushing out data any way they can, with some collections relatively scrubbed and organized and others completely raw. Search-based applications are a great fit because they are natively designed to process and aggregate large volumes of heterogeneous, less-than-perfect data. And for me, the ‘aggregation’ part is going to drive the most interesting use cases, whether it’s integrating multiple Open Data sources or cross-referencing Open Data with Web and enterprise resources. Of course, in addition, there’s no other technology as adept as SBAs at making even complex information accessible and meaningful to non-specialist users, which an essential part of the Open Data mission.”

PS: To learn more about Open Data, the Wikipedia Open Data page offers a good starting point. See also Exalead’s Dataconnexions post for info about Exalead’s involvement in the French Government’s Open Data initiative.