Posts Tagged ‘sphinx’

Sphinx search introduction

507110_binocular

After reading my introduction to full text search or you have read article somewhere else and decided to go with full text search in your next project, but you still confuse what full text search engine to use. One implementation of full text search engine is Sphinx. And I’ll give you a short course on how you installing Sphinx for your full text search engine.

Sphinx is a full-text search engine, distributed under GPL version 2. It is not only fast in searching but it is also fast in indexing your data. Currently, Sphinx API has binding in PHP, Python, Perl, Ruby and Java.

Sphinx features

  • high indexing speed (upto 10 MB/sec on modern CPUs);
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document exceprts generation;
  • provides searching from within MySQL through pluggable storage engine;
  • supports boolean, phrase, and word proximity queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports both single-byte encodings and UTF-8;
  • supports English stemming, Russian stemming, and Soundex for morphology;
  • supports MySQL natively (MyISAM and InnoDB tables are both supported);
  • supports PostgreSQL natively.

There you go, so fire up your terminal or console, and let’s get thing done. Continue reading »