Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great article. I've rolled my own full-text search engines in the past and it's a category of problems that I love, but even I have to admit that I'm often astounded by Lucene's performance. The inverted index really lets you stretch commodity hardware into pretty huge use-cases.

If you've never used ElasticSearch, I should note that that's one of ES's many strengths -- it takes advantage of Lucene and makes deployments on commodity hardware work really well. An ES cluster on five small EC2 instances can handle a tremendous workload.

There is one thing about ES/Lucene that bugs me though... in the 3+ years I've been running it in production, I still haven't been able to solve the "every once in a while java utilizes 100% CPU until you restart the service" issue. I suspect it has to do with Lucene's index merge operation, but no amount of tinkering has solved the problem.



One of the boons of Java is its remote debugging support.. you can attach a profiler to a process when something like this happens, extract thread names & stacks, and so on.

AFAIK you can also use the Linux 'perf trace' command on a Java process, but probably there is some more setup involved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: