Jean Marc brought you a very delightful post about Chromatik last week with a lot of beautiful images. I will now describe in more detail how it was built. As with the DVD you perhaps watched last night, I am afraid there will be fewer big special effects in this blog post than in Jean Marc’s post, but I hope to give you an insightful view of what happened behind the scenes.
Chromatik was an elaborate demo, the result of a long effort on both the back-end and the front-end. It indexes one million images. For each image, a unique color signature was built and indexed. Our current intuitive user interface, exploits this index to help you filter and select images by choosing a combination of colors, luminosity or text.
A large number of people tried and liked the Chromatik demo so much that we received several requests to integrate it into the official Exalead search site. And because the demo ran relatively bug free and smoothly, our friends thought it was a piece of cake. Of course, it was a bit more work than we initially expected. So where are the challenges?
1) The front-end side
A lot of questions needed to be answered:
- How will I adapt the GUI of my application to integrate the new features?
- Are all these new features necessary?
- What is the feedback we’ve received on the different features?
- What is the added value of these features?
The answers to these questions will impact the total amount of space on the GUI we will take for surfacing them.
2) The back-end side
Let’s begin with a little theory:
Theorem of the factor 10 effect:
No matter how good a developer you are, if non-trivial code has been designed and tested with only N elements, it won’t work without modifications when applied to 10 * N elements.
Demonstration: Rather simple: if you don’t believe it, try it yourself…
In this case we wanted a factor 1000, so we knew it would need some adjustments. When you know this theorem, the advantage is that you can anticipate potential problems. And the experience we have accumulated from similar situations at Exalead help us predict most of the bottlenecks.
Example 1: Chromatik needed 300MB RAM, which is quite good for 1M images. But, if you multiply this number by 2000, you have 600GB RAM, which is quite large, even if the final index is distributed over multiple machines.
We therefore decided to reduce the richness of the colors, while maintaining usability, migrate from version 4.6 to version 5.0 of Exalead CloudView, and use a more compressed encoding. In the end, it now only costs 9GB.
Example 2: When you want to analyze two billion images, you need to have a robust code, which means that’s able to handle all sort of images even those which do not have a valid RFC. It’s not that easy, when even the most used library in the world for basic image manipulation can crash on some images as we reported.
The result was that this run spotted some bugs in our code we hadn’t seen before and therefore had to fix.
Example 3: The demo was initially a single machine application. We needed to use the distributed system framework included in the CloudView technology to be able to run the whole process of extracting, crawling, and indexing in only a few weeks. This framework really helped us transform the single machine demo to a fully load-balanced and monitored application. This use case is a little different than our standard www.exalead.com chain, so we discovered and tweaked a few cumbersome points in the code.
The purpose of this integration was to offer a new service to the users of the exalead.com search engine and improve the robustness of the Chromatik technology. We now better understand the impact of different tweaks on color indexing.
Transforming a demo into a real product is not as easy as it seems. I hope this post helps you understand why a lot of companies only show you demos but never real live applications.
At Exalead, we don’t sell demos to our customers; we sell tested and robust solutions. We make sure we work hard to test and uncover all the issues so our customers’ implementations go smoothly.