NewspaperARCHIVE evaluates Hadoop/Solr – decides to go with Exalead

Recently, Exalead made an announcement, “NewspaperARCHIVE.com Scales with Exalead” regarding one of our newest clients, NewspaperARCHIVE.com.

We mentioned in the press release that NewspaperARCHIVE.com set out to replace Autonomy search – and that they looked at a number of alternatives, including at open source search software Solr.

What we didn’t mention in the press release is that NewspaperARCHIVE also evaluated a combination of Solr and Hadoop before purchasing an Exalead license.

A few facts about that Solr/Hadoop evaluation that we didn’t mention in the press release:

– The NewspaperArchive database contains of just over 100 million newspaper pages, each averaging about 6,000 words – a total of roughly 600 billion terms or 2 TBs of text.

– NewspaperArchive already owned a number of midrange servers (HP ProLiant DL300 Servers).

– NewspaperArchive decided that the only way SOLR/Hadoop would work better if they purchased a large number of new commodity servers to run SOLR/Hadoop.

– With Exalead, NewspaperARCHIVE was able to produce more efficient results on the existing server farm.

How so, you may ask. Well, Exalead built a distributed computation layer equivalent to Hadoop/map-reduce for our web search engine. We call it dSort. For NewspaperARCHIVE, Exalead’s dSort technology accomplished everything SOLR/Hadoop did have in terms of distributed computing – but managed it more efficiently. And more importantly, Exalead’s built in semantic processing capabilities assured that customers of NewspaperARCHIVE would significantly more relevant results. In my conversations with the NewspaperARCHIVE team, they’ve said they’re very happy. And so are we.

  • There is perceptibly a bunch to know about this. I think you made various good points in features also.

  • I’ll bite:
    I’d love to see the research, numbers, etc. to reach this conclusions. Without that, this is potentially FUD.