Blog / Linux/ Building a Vertical Search Engine with Sphider and SCWS: Installation Notes

Building a Vertical Search Engine with Sphider and SCWS: Installation Notes

Sphider + SCWS 为网站搭建垂直搜索引擎笔记

Environment Configuration

This guide is based on the following environment. Note that some software versions are outdated; testing in a similar environment or upgrading as appropriate is recommended.

  • Operating System: CentOS release 6.9 64bit (Final)
  • Kernel Version: Linux version 2.6.32-696.23.1.el6.x86_64
  • Web Server: nginx/1.12.2
  • PHP Version: 5.3.3 (Zend Engine v2.3.0)
  • Database: MySQL Server version: 5.5.56

1. Installing the SCWS Chinese Word Segmentation System

First, connect via SSH to your server with a configured LNMP environment.

It's recommended to use a screen or tmux session to prevent connection interruption during compilation:

screen -S soxiaohost

Navigate to your planned search engine website directory (example shown):

cd /home/wwwroot/so.blog.youquso.com

Download the SCWS source package (note: original links may be outdated; use a reliable mirror):

wget http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2

Extract and enter the source directory:

tar xvjf scws-1.2.3.tar.bz2
cd scws-1.2.3

Compile and install to /usr/local/scws:

./configure --prefix=/usr/local/scws
make
make install

Verify installation:

ls -al /usr/local/scws/lib/libscws.la

If the file exists, the library is installed. Test the CLI tool:

/usr/local/scws/bin/scws -h

This should display help info with version (scws-cli/1.2.3) and usage.

Downloading and Installing Dictionaries

Navigate to the SCWS config directory and download/extract GBK and UTF-8 Chinese dictionaries:

cd /usr/local/scws/etc
wget http://www.xunsearch.com/scws/down/scws-dict-chs-gbk.tar.bz2 && tar xvjf scws-dict-chs-gbk.tar.bz2
wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2 && tar xvjf scws-dict-chs-utf8.tar.bz2

Dictionary files (*.xdb) should now be in /usr/local/scws/etc.

2. Installing the PHP Extension

First, install required compilation tools and PHP development packages:

yum install epel-release -y
yum update -y
yum install php-devel autoconf automake -y

Enter the PHP extension directory within the SCWS source:

cd /home/wwwroot/so.blog.youquso.com/scws-1.2.3/phpext

Run phpize to prepare the build environment (adjust path if needed):

/usr/local/php/bin/phpize

Configure build options, specifying the SCWS install path:

./configure --with-scws=/usr/local/scws

Note: If PHP is installed in a non-standard location, add --with-php-config=$php_prefix/bin/php-config to the configure command.

Compile and install:

make
make test  # Optional
make install # Requires root

Successful compilation will output the extension module (scws.so) install path.

Configuring PHP to Load the Extension

Edit your PHP configuration file (path may vary):

vi /usr/local/php/etc/php.ini

Add these lines at the end:

[scws]
; Ensure extension_dir is correct or use the absolute path to scws.so.
extension = /usr/lib64/php/modules/scws.so
scws.default.charset = utf-8
scws.default.fpath = /usr/local/scws/etc

Save, exit, and restart PHP-FPM:

/etc/init.d/php-fpm restart

Verify the extension is loaded:

php -m | grep scws

Or check phpinfo() output in a webpage.

Troubleshooting: 32-bit/64-bit Mismatch

If PHP-FPM restart fails with an error like:

PHP Startup: Unable to load dynamic library '/usr/lib64/php/modules/scws.so' - /usr/lib64/php/modules/scws.so: undefined symbol: zend_new_interned_string

This is typically a 32-bit/64-bit mismatch between PHP and the extension. Recompile the extension, explicitly specifying the PHP config path and enabling 64-bit support if needed:

./configure --with-scws=/usr/local/scws --with-php-config=/usr/local/php/bin/php-config

Then run make && make install again.

Post a Comment

Your email will not be published. Required fields are marked with *.