全文检索Sphinx
2009-07-11技术概览
-
全文检索引擎采用:sphinx(0.9.8rc2)
-
中文分词:libmmseg(0.7.3)
-
Rails调用引擎的插件:thinking-sphinx(0.9.5)
注意,当前使用的thinking-sphinx与官方的不一样,增加了中文分词的配置,修正delta index无法自动更新的问题
安装
libmmseg
sudo apt-get install g++ cd ~ wget http://cloud.github.com/downloads/saberma/saberma.github.com/mmseg-0.7.3.tar.gz tar zxvf mmseg-0.7.3.tar.gz cd mmseg-0.7.3 ./configure make sudo make install 安装ruby扩展 cd ruby cp /usr/local/include/mmseg/\*.h . cp ../src/\*.h . cp ../src/css/\*.h . ruby extconf.lin.rb make sudo make install
注意:如果在这一步出错,且出错提示为:
css/UnigramCorpusReader.cpp:89: error: ’strncmp’ was not declared in this scope
则需手工编辑.src/css目录下UnigramCorpusReader.cpp 文件,在其第一行加上
#include
然后重新 make,即可通过
注:已生成的词库uni.lib放在项目的lib目录下(值班室项目已经放置此文件了)
sphinx
#"下载sphinx"
cd ~
wget http://cloud.github.com/downloads/saberma/saberma.github.com/sphinx-0.9.8-rc2.tar.gz
tar zxvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
sudo apt-get install patch
#下载中文补丁
wget http://cloud.github.com/downloads/saberma/saberma.github.com/sphinx-0.98rc2.zhcn-support.patch
patch -p1 < sphinx-0.98rc2.zhcn-support.patch
#下载防crash补丁
wget http://cloud.github.com/downloads/saberma/saberma.github.com/fix-crash-in-excerpts.patch
patch -p1 < fix-crash-in-excerpts.patch
./configure
make
sudo make install
注意:如果在这一步出现
/usr/local/include/mmseg/freelist.h:22: error: ‘strlen’ was not declared in this scope
的错误,手工修改 /usr/local/mmseg/include/mmseg/freelist.h
在上面添加
#include
安装thinking-sphinx
(此步骤已经集成进[Rails说明]中[获取源代码]一节,不需要再独立执行)
git submodule init git submodule update
如果安装时报错,按以下步骤处理
#删除.gitmodules,.git/config中的submodule配置 #删除thinking-sphinx目录 git rm --cached vendor/plugins/thinking-sphinx sudo rm -r vendor/plugins/thinking-sphinx git submodule add -b v0.9.5chinese git://github.com/saberma/thinking-sphinx.git vendor/plugins/thinking-sphinx
启动引擎
(此步骤应在[Rails说明]中[获取源代码]之后操作)
#生成sphinx配置文件 rake ts:config #建立索引 rake ts:index #启动引擎 rake ts:start
测试引擎
(此步骤应在[Rails说明]中[获取源代码]之后操作)
script/console c = Call.last c.callnumber = '13911112222' c.save #可以看到后台输出更新delta index Sphinx 0.9.8-rc2 (r1234) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/home/mahb/Documents/zbs/config/development.sphinx.conf'... indexing index 'call_delta'..."防crash补丁":http://www.coreseek.com/uploads/sources/fix-crash-in-excerpts.patch collected 1 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 1 docs, 31 bytes total 0.031 sec, 1014.20 bytes/sec, 32.72 docs/sec rotating indices: succesfully sent SIGHUP to searchd (pid=5812) #开始查询 Call.search '13911112222' #这时可以看到相应的记录
使用说明
“参考资料”:http://ts.freelancing-gods.com/usage.html
#或参考call.rb
相关参考资料
- “使用libmmseg实现Ruby的中文分词功能”:http://www.kuqin.com/searchengine/20080525/8886.html
- “Rails程序员Sphinx中文全文检索安装指南”:http://www.coreseek.com/forum/index.php?action=vthread&forum=2&topic=17 2&topic=17> 2&topic=17> 2&topic=17> topic=17>