Lucene is quite fantastic and Elasticsearch makes it a joy to use. Still, I wond...

vdfs · on March 13, 2015

There is a port of Lucene to C++, CLucene[1], it's compatible with version 2.3 of Java Lucene, the project is stopped long time ago, but it's very much stable, and works perfectly. An other port which is compatible with version 3 of java Lucene is LucenePlusPlus, but it use a lot of boost's smart pointers, the port seems like t was automated. This port was why CLucene development stopped, the maintainers wanted to make this new port faster by not using smart pointers whenever possible, but that didn't happen.

1: http://sourceforge.net/projects/clucene 2: https://github.com/luceneplusplus/LucenePlusPlus

MichaelGG · on March 13, 2015

Oddly enough, I don't see anyone talking about benchmarks for those projects. I found one offhand comment saying it was 2-3 faster than Java for indexing, but only 10% better for search. No real benchmarks or such. I suppose that's not the only reason to want a non-JVM version but it seems like a pretty major reason and something that'd warrant headline treatment in the readme...

boomzilla · on March 13, 2015

Java overhead is not a huge issue unless you are embedding Lucene into some low spec devices. A consumer search system is usually relying heavily on cache (just like any databases), so even a 30-50% latency hit on cold queries is not that big a deal if > 90% of your queries are served from cache.

GC is a big problem when you don't know the expected query distribution which is the case for Elasticsearch's analytics. There is a lot more to a search engine than packing, decoding and merging posting lists. I've never seen anything that compares with Lucene text analysis and scoring API supports.

syllogism · on March 13, 2015

That fuzzy search story is more worrying than anything else, really. What they did seems just crazy to me.

First, the paper really doesn't seem so difficult. Second, they don't even think about reaching out to the author/s?

I suppose I shouldn't talk until I've tried their task. But I've implemented a lot of algorithms from papers, and their story had me shaking my head.

burntsushi · on March 13, 2015

I've also implemented a lot of algorithms from papers, and their story has me nodding my head in agreement. Some algorithms papers are just downright impenetrable.

darklajid · on March 14, 2015

I tried implementing the same paper's algorithm in the past and somewhat succeeded - but gave up in the end. Precumputing the automatons was slow as hell, I came to a similar conclusion as the authors (N > 2 isn't really feasible, but was something I was interested in) and my plumbing sucked.

I'm really not experienced reading papers and this was the only one I ever tried, so I cannot compare it to others. It certainly was quite hard to follow for me and took some month of nightly dabbling before I reached the point above.

proveanegative · on March 13, 2015

Is there a self-contained alternative to ElasticSearch specifically? If there was one written in Go or otherwise statically linkable that would be great from a deployment standpoint. I could deal with somewhat worse performance in exchange for that.

frik · on March 13, 2015

Search for "golang full text search database".

Lucene & Hadoop meant a big push for the Java eco-system, it's like a lock-in. Native C++ libraries and other free text search implementations have a smaller community and are usually less known. With Go, C++11 and Rust the future looks bright but it will take some time to catch up.

syen · on March 14, 2015

Agree, it's early days for non-Java based alternatives.

One of my colleagues, Marty Schoch, has been working on a full text search engine in golang, called bleve [1]

1: http://www.blevesearch.com/

proveanegative · on March 13, 2015

>Search "golang full text search database".

I have. The problem is picking one that is mature enough and will be supported for years as you can expect ElasticSearch to be, which is what I meant by "an alternative".

I agree with you about the future looking bright but I meant something you could use right now.

frik · on March 14, 2015

It's hard to say, for Go there is e.g. bleve FTS and there are ports of Java Lucene to Go (e.g. https://github.com/balzaczyy/golucene). Such ports are either semi-automatic or automatic, only automatic ports. It's hard for Lucene ports to keep up, as Lucene is moving fast and most ports stalled.

One could also use a service oriented architecture and use e.g. ElasticSearch Rest API or C++ based Sphinx Search, both need litttle configuration and no custom code.