|
NAMEmroonga - Mroonga DocumentationFor the release history: News THE CHARACTERISTICS OF MROONGAWhat is Mroonga?Mroonga is a MySQL storage engine based on Groonga, the full text search engine.In MySQL 5.1 or later, Pluggable Storage Engine interface is introduced, and we can use custom storage engines easily. So we implement Mroonga, so that we can use Groonga through MySQL. By using Mroonga, you can use Groonga with SQL. The successor of TritonnTo support Japanese full text search, Tritonn was developed by embedding Senna, the predecessor of Groonga, in MySQL. Mroogna is its successor.Running as a MySQL pluginSince Tritonn was the modified version of MySQL, we need to build it by ourselves or use binary files provided by Tritonn project, thus we cannot use the official binary files provided by MySQL.On the other hand, Mroonga is an independent program (shared library) using Pluggable Storage Engine interface, and we can dynamically load it on MySQL's official binary. So we can use it more easily than Tritonn. Faster index updateComparing to Senna, Groonga has much better throughput in adding or updating index.Mroonga also has the benefit of this performance improvement. Faster searchIn Tritonn, we use MyISAM storage engine, thus we have a exclusive table lock by updating data (and index), and it prevents the performance of search.But in Mroonga, we no longer have this issue, and the performance of search is better especially in frequent data update cases. Geolocation searchGroonga supports not only the full text search, but also the fast geolocation search using index. And MySQL also has the syntax for geolocation search. With Mroonga, you can use Groonga's fast geolocation search by using MySQL's geolocation SQL syntax.Sharing the same Groonga storageMroonga stores the data by using Groonga's DB API. And its storage file's format is same as that of the file that is managed by Groonga itself only. Therefore you can share the same Groonga storage like below.
And Groonga's storage file can be shared with multi-processes and multi-threads, so that we can invoke several search queries to the same storage file simultaneously. Associate with other storage enginesMroonga has two running modes.One is "storage mode", that is the default mode, and we use Groonga for both storing data and searching. With this mode, you can have full benefits of Groonga described above, like fast data update, lock-free full text search and geolocation search. But it does not support transactions. Another one is "wrapper mode", that adds full text search function on other storage engines like MyISAM or InnoDB. With this mode, you can use Groonga's fast full text search with having the benefits of the storage engine, ex. transaction in InnoDB. But you cannot have benefits from Groonga's read-lock free characteristic. And you might have the performance bottle neck in the storage engine in updating data. Supported platformsMroonga supports many of the major platforms.Refer to Install for supported platforms. INSTALLThis section describes how to install Mroonga on each environment. There are packages for major platforms. It's recommended that you use package instead of building Mroonga by yourself. But don't worry. There is a document about building Mroonga from source.We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. WindowsThis section describes how to install Mroonga on Windows. You can install Mroogna by extracting a zip package.Mroonga binary for Windows is provided with MariaDB binary because some changes are needed for building Mroonga for Windows. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. InstallerCAUTION:The following MSI files don't work yet. Please use zip
files below or help us to creating MSI files that work well.
Download MSI file and execute it. You need to choose a MSI for your environment. Choose win32 version for 32-bit environment, winx64 version for 64-bit environment:
ZipDownload zip file and extract it. You need to choose a zip for your environment:
Zip packages are pre-configured for easy to use, so no need to execute INSTALL PLUGIN and CREATE FUNCTION. Just start mysqld by following command: > mysqld.exe --defaults-file=.\MY-PREFERRED-INI.ini --console Each zip package contains ini files (my-small.ini, my-medium.ini, my-large.ini and so on), choose preferred ini file which meets on your demand. Next connect to MariaDB by following command: > mysql.exe MariaDB [(none)]> SHOW ENGINES; +--------------------+---------+------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +--------------------+---------+------------------------------------------------------------+--------------+------+------------+ | CSV | YES | CSV storage engine | NO | NO | NO | | PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO | | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | | MyISAM | YES | MyISAM storage engine | NO | NO | NO | | MRG_MyISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | InnoDB | DEFAULT | Supports transactions, row-level locking, and foreign keys | YES | YES | YES | | Mroonga | YES | CJK-ready fulltext search, column store | NO | NO | NO | | Aria | YES | Crash-safe tables with MyISAM heritage | NO | NO | NO | +--------------------+---------+------------------------------------------------------------+--------------+------+------------+ 8 rows in set (0.00 sec) Build from source with MariaDBYou need to use Mroonga bundled MariaDB source provided by the Mroonga project.You can find it in https://packages.groonga.org/source/mroonga/. Mroonga bundled MariaDB source has mariadb-${MARIADB_VERSION}-with-mroonga-${MROONGA_VERSION}.zip file name. You can build the source code with the standard MariaDB build process. You need to register Mroonga after building MariaDB. Use SQL at ${MARIADB_BUILD_DIR}\storage\mroonga\data\install.sql to register Mroonga. macOSThis section describes how to install Mroonga on macOS. You can install Mroonga by Homebrew.HomebrewSee mroonga/homebrew-mroonga on GitHub for details.Debian GNU/LinuxThis section describes how to install Mroonga related deb packages on Debian GNU/Linux. You can install them by apt.We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. bullseye (MariaDB)Install:% sudo apt update % sudo apt install -y -V apt-transport-https % sudo apt install -y -V wget % wget https://packages.groonga.org/debian/groonga-apt-source-latest-bullseye.deb % sudo apt install -y -V ./groonga-apt-source-latest-bullseye.deb % sudo apt update % sudo apt install -y -V mariadb-server-10.5-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt install -y -V groonga-tokenizer-mecab bullseye (with the Oracle MySQL 8.0 package)Install:% sudo apt update % sudo apt install -y -V apt-transport-https % sudo apt install -y -V wget % wget https://packages.groonga.org/debian/groonga-apt-source-latest-bullseye.deb % wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb % sudo apt install -y -V ./groonga-apt-source-latest-bullseye.deb % sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-8.0 apt install -y ./mysql-apt-config_0.8.17-1_all.deb % sudo apt update % sudo apt install -y -V mysql-community-8.0-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt install -y -V groonga-tokenizer-mecab buster (MariaDB)Install:% sudo apt update % sudo apt install -y -V apt-transport-https % sudo apt install -y -V wget % wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb % sudo apt install -y -V ./groonga-apt-source-latest-buster.deb % sudo apt update % sudo apt install -y -V mariadb-server-10.3-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt install -y -V groonga-tokenizer-mecab buster (with the Oracle MySQL 5.7 package)Install:% sudo apt update % sudo apt install -y -V apt-transport-https % sudo apt install -y -V wget % wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb % wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb % sudo apt install -y -V ./groonga-apt-source-latest-buster.deb % sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-5.7 apt install -y ./mysql-apt-config_0.8.17-1_all.deb % sudo apt update % sudo apt install -y -V mysql-community-5.7-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt install -y -V groonga-tokenizer-mecab buster (with the Oracle MySQL 8.0 package)Install:% sudo apt update % sudo apt install -y -V apt-transport-https % sudo apt install -y -V wget % wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb % wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb % sudo apt install -y -V ./groonga-apt-source-latest-buster.deb % sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-8.0 apt install -y ./mysql-apt-config_0.8.17-1_all.deb % sudo apt update % sudo apt install -y -V mysql-community-8.0-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt install -y -V groonga-tokenizer-mecab UbuntuThis section describes how to install Mroonga related deb packages on Ubuntu. You can install them by apt.We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. PPA (Personal Package Archive)The Mroonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad. You can install Mroonga by APT from the PPA.Here are supported Ubuntu versions:
Here are Ubuntu versions that supports MySQL:
Here are Ubuntu versions that supports MariaDB:
Enable the universe repository and the security update repository to install Mroonga: % sudo apt-get install -y -V software-properties-common lsb-release % sudo add-apt-repository -y universe % sudo add-apt-repository "deb http://security.ubuntu.com/ubuntu $(lsb_release --short --codename)-security main restricted" Add the ppa:groonga/ppa PPA to your system: % sudo add-apt-repository -y ppa:groonga/ppa % sudo apt-get update Install Mroonga for MySQL: % sudo apt-get install -y -V mysql-server-mroonga Install Mroonga for MariaDB: % sudo apt-get install -y -V mariadb-server-mroonga If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt-get install -y -V groonga-tokenizer-mecab AlmaLinuxThis section describes how to install Mroonga related RPM packages on AlmaLinux. You can install them by dnf.AlmaLinux 8 (with the Oracle MySQL 8.0 package)You can use Oracle's MySQL packages version 8.0 on AlmaLinux 8 since Mroonga 11.10 release.NOTE: There are already known issues about MySQL 8.0.
Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y https://repo.mysql.com/mysql80-community-release-el8.rpm % sudo dnf install -y groonga-libs % sudo dnf install --disablerepo=AppStream -y --enablerepo=epel mysql-community-8.0-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab AlmaLinux 8 (with Percona Server 8.0 package)You can use Percona Server packages version 8.0 on AlmaLinux 8 since Mroonga 11.10 release.Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y https://repo.percona.com/release/percona-release-latest.noarch.rpm % sudo dnf install -y --enablerepo=epel percona-server-8.0-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab AlmaLinux 8 (with MariaDB 10.3 package)You can use MariaDB's MariaDB packages version 10.3 on AlmaLinux 8 since Mroonga 11.10 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.3/centos8-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y boost-program-options % sudo dnf install --disablerepo=AppStream -y MariaDB-server % sudo systemctl start mariadb % sudo dnf install -y --enablerepo=epel mariadb-10.3-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab AlmaLinux 8 (with MariaDB 10.4 package)You can use MariaDB's MariaDB packages version 10.4 on AlmaLinux 8 since Mroonga 11.10 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.4/centos8-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y boost-program-options % sudo dnf install --disablerepo=AppStream -y MariaDB-server % sudo systemctl start mariadb % sudo dnf install -y --enablerepo=epel mariadb-10.4-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab AlmaLinux 8 (with MariaDB 10.5 package)You can use MariaDB's MariaDB packages version 10.5 on AlmaLinux 8 since Mroonga 11.10 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.5/centos8-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y boost-program-options % sudo dnf install --disablerepo=AppStream -y MariaDB-server % sudo systemctl start mariadb % sudo dnf install -y --enablerepo=epel mariadb-10.5-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab AlmaLinux 8 (with MariaDB 10.6 package)You can use MariaDB's MariaDB packages version 10.6 on AlmaLinux 8 since Mroonga 11.10 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.6/centos8-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm % sudo dnf install -y boost-program-options % sudo dnf install --disablerepo=AppStream -y MariaDB-server % sudo systemctl start mariadb % sudo dnf install -y --enablerepo=epel mariadb-10.6-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab CentOSThis section describes how to install Mroonga related RPM packages on CentOS. You can install them by yum.We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. CentOS 7 (with the Oracle MySQL 5.7 package)You can use Oracle's MySQL packages version 5.7 on CentOS 7 since Mroonga 5.09 release.Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y https://repo.mysql.com/mysql-community-release-el7.rpm % sudo yum install -y yum-utils % sudo yum-config-manager --disable mysql80-community % sudo yum-config-manager --enable mysql57-community % sudo yum install -y --enablerepo=epel mysql-community-5.7-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with the Oracle MySQL 8.0 package)You can use Oracle's MySQL packages version 8.0 on CentOS 7 since Mroonga 9.04 release.NOTE: There are already known issues about MySQL 8.0.
Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y https://repo.mysql.com/mysql-community-release-el7.rpm % sudo yum install -y --enablerepo=epel mysql-community-8.0-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with Percona Server 5.7 package)You can use Percona Server packages version 5.7 on CentOS 7 since Mroonga 6.02 release.Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm % sudo yum install -y --enablerepo=epel percona-server-5.7-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with Percona Server 8.0 package)You can use Percona Server packages version 8.0 on CentOS 7 since Mroonga 10.06 release.Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm % sudo yum install -y --enablerepo=epel percona-server-8.0-mroonga (% sudo systemctl start mysqld) (% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //')) (% sudo mysqladmin -u root --password="${tmp_password}" password) If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with MariaDB 10.2 package)You can use MariaDB's MariaDB packages version 10.2 on CentOS 7 since Mroonga 7.06 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.2/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y MariaDB-server % sudo systemctl start mariadb % sudo yum install -y --enablerepo=epel mariadb-10.2-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with MariaDB 10.3 package)You can use MariaDB's MariaDB packages version 10.3 on CentOS 7 since Mroonga 7.11 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.3/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y MariaDB-server % sudo systemctl start mariadb % sudo yum install -y --enablerepo=epel mariadb-10.3-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with MariaDB 10.4 package)You can use MariaDB's MariaDB packages version 10.4 on CentOS 7 since Mroonga 9.07 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.4/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y MariaDB-server % sudo systemctl start mariadb % sudo yum install -y --enablerepo=epel mariadb-10.4-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with MariaDB 10.5 package)You can use MariaDB's MariaDB packages version 10.5 on CentOS 7 since Mroonga 10.06 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.5/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y MariaDB-server % sudo systemctl start mariadb % sudo yum install -y --enablerepo=epel mariadb-10.5-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab CentOS 7 (with MariaDB 10.6 package)You can use MariaDB's MariaDB packages version 10.6 on CentOS 7 since Mroonga 11.09 release.Create /etc/yum.repos.d/MariaDB.repo with the following content: [mariadb] name = MariaDB baseurl = https://yum.mariadb.org/10.6/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 Install: % sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm % sudo yum install -y MariaDB-server % sudo systemctl start mariadb % sudo yum install -y --enablerepo=epel mariadb-10.6-mroonga (% sudo mysqladmin -u root password 'new-password') If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab FedoraFedora 21MariaDB 10.0.x is adopted on Fedora 21. As Mroonga is now bundled with MariaDB since 10.0.15, you can use bundled version of Mroonga.Note that MariaDB bundled Mroonga version is a bit old. DockerMroonga is also available via Docker Hub.See Quick Start guide to start. OthersThis section describes how to install Mroonga from source code. If there is no package no your environment, you need to build Mroonga from source code.DependenciesMroonga needs some tools, libraries and MySQL for build. You can use MariaDB instead of MySQL.ToolsHere are required tools:
You must get them ready. You can use CMake instead of shell but this document doesn't describe about building with CMake. Here are optional tools:
LibrariesHere are required libraries:
Here are optional libraries:
If you want to use indexes of tokenizing of each morpheme
for full text search, install MeCab before installing Groonga.
MySQLMroonga needs not only installed MySQL but also MySQL source and build directory. You can't use MySQL package. It doesn't provide MySQL source and build directory. You need MySQL source and build directory!If you use MariaDB instead of MySQL, you need MariaDB source. Download the latest MySQL 5.6 source code, then build and install it. SEE ALSO: Download MySQL Community Server
Here we assume that you use mysql-5.6.21 and its source code is extracted in the following directory: /usr/local/src/mysql-5.6.21 Then build in the following directory: /usr/local/build/mysql-5.6.21 Here are command lines to build and install MySQL: % cd /usr/local/build/mysql-5.6.21 % cmake /usr/local/src/mysql-5.6.21 % make % sudo make install And we assume that MySQL is installed in the following directory: /usr/local/mysql Build from sourceMroonga uses GNU build system. So the following is the simplest build steps:% wget https://packages.groonga.org/source/mroonga/mroonga-6.12.tar.gz % tar xvzf mroonga-6.12.tar.gz % cd mroonga-6.12 % ./configure \ --with-mysql-source=/usr/local/src/mysql-5.6.21 \ --with-mysql-build=/usr/local/build/mysql-5.6.21 \ --with-mysql-config=/usr/local/mysql/bin/mysql_config % make % sudo make install % /usr/local/mysql/bin/mysql -u root < /usr/local/share/mroonga/install.sql You need to specify the following on configure:
You can confirm Mroonga is installed successfully by SHOW ENGINES SQL. If you can find Mroonga row, Mroonga is installed successfully: mysql> SHOW ENGINES; +------------+---------+------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +------------+---------+------------------------------------------------------------+--------------+------+------------+ | Mroonga | YES | Fulltext search, column base | NO | NO | NO | | MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | CSV | YES | CSV storage engine | NO | NO | NO | | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO | | InnoDB | YES | Supports transactions, row-level locking, and foreign keys | YES | YES | YES | | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | +------------+---------+------------------------------------------------------------+--------------+------+------------+ 6 rows in set (0.00 sec) The following describes details about each step. configureFirst, you need to run configure. Here are important configure parameters:--with-mysql-source=PATHSpecifies the location of MySQL source code.This is required parameter: % ./configure \ --with-mysql-source=/usr/local/src/mysql-5.6.21 \ --with-mysql-config=/usr/local/mysql/bin/mysql_config --with-mysql-build=PATHSpecifies the location where you build MySQL source code.If you build MySQL in MySQL source code directory, you don't need to specify this parameter. If you build MySQL in other directory, you need to specify this parameter. Here is an example when you build MySQL in /usr/local/build/mysql-5.6.21: % ./configure \ --with-mysql-source=/usr/local/src/mysql-5.6.21 \ --with-mysql-build=/usr/local/build/mysql-5.6.21 \ --with-mysql-config=/usr/local/mysql/bin/mysql_config --with-mysql-config=PATHSpecifies the path of mysql_config command.If mysql_config command can be found by PATH, you don't need to specify this parameter. For example, if mysql_config command exists at /usr/bin/mysql_config, you don't need to specify this parameter: % ./configure \ --with-mysql-source=/usr/local/src/mysql-5.6.21 --with-default-tokenizer=TOKENIZERSpecifies the default tokenizer for full text. You can custom it in my.cnf.The default is TokenBigram. Here is an example to use TokenMecab as the default tokenizer: % ./configure \ --with-mysql-source=/usr/local/src/mysql-5.6.21 \ --with-mysql-config=/usr/local/mysql/bin/mysql_config \ --with-default-tokenizer=TokenMecab --prefix=PATHSpecifies the install base directory. Mroonga related files are installed under ${PATH}/ directory except ha_mroonga.so. ha_mroonga.so is a MySQL plugin file. It is installed the plugin directory of MySQL.The default is /usr/local. In this case, install.sql that is used for installing Mroonga is installed to /usr/local/share/mroonga/install.sql. Here is an example that installs Mroonga into ~/local for an user use instead of system wide use: % ./configure \ --prefix=$HOME/local \ --with-mysql-source=$HOME/local/src/mysql-5.6.21 \ --with-mysql-config=$HOME/local/mysql/bin/mysql_config PKG_CONFIG_PATH=PATHThis is not a configure parameter but we describe it for users who doesn't install Groonga into the standard location.If Groonga is not installed in the standard location like /usr/lib, you need to specify its location by PKG_CONFIG_PATH. For example, if Groonga is installed with --prefix=$HOME/local, use the following command line: ./configure \ PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig \ --with-mysql-source=/usr/local/src/mysql-5.6.21 \ --with-mysql-config=/usr/local/mysql/bin/mysql_config makeconfigure is succeeded, you can build Mroonga by make:% make If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores CPU, it's good for using -j4 option: % make -j4 If you get some errors by make, please report them to us: How to report a bug make installNow, you can install built Mroonga!:% sudo make install If you have write permission for ${PREFIX} and the plugin directory of MySQL, you don't need to use sudo. e.g. --prefix=$HOME/local case. In this case, use make install: % make install mysql -u root < install.sqlYou need to run some SQLs to register Mroonga to MySQL such as INSTALL PLUGIN and CREATE FUNCTION. They are written in ${PREFIX}/share/mroonga/install.sql.Here is an example when you specify --prefix=$HOME/local to configure: % mysql -u root < $HOME/local/share/mroonga/install.sql uninstall MroongaIf you want to remove Mroonga, type below commands:% mysql < ${PREFIX}/share/mroonga/uninstall.sql % cd ${MROONGA_BUILD_DIR} % sudo make uninstall UPGRADEThere is a case that incompatible change is introduced at new release. It is announced by release announce if new release contains such a incompatible change.Here is the list of recommended way of upgrading Mroonga from old release. See following URL about upgrade sequence if you use previous version. If you upgrade prior to 1.20, refer to Release 1.20 - 2012/01/29 If you upgrade from 1.20, refer to Release 2.00 - 2012/02/29 If you upgrade from 2.00 or 2.01, refer to Release 2.02 - 2012/04/29 If you upgrade from 2.00 or later and using multiple column indexes on storage mode, refer to Release 2.03 - 2012/05/29 If you upgrade from 2.04 or later and using SET column or ENUM that has the number of elements < 256 in Storage mode, refer to Release 2.05 - 2012/07/29 If you upgrade from 2.05 or later and using multiple column indexes against VARCHAR or CHAR, refer to Release 2.08 - 2012/10/29 If you upgrade from 2.08 or later and using TIMESTAMP column, please recreate database. If you upgrade from 2.08 or later and using CHAR(N) as primary key, please recreate index. Refer to Release 2.09 - 2012/11/29 for each case. If you upgrade prior to 5.03 and satisfies following the conditions, refer to Release 5.04 - 2015/06/29 and upgrade schema.
TUTORIALIf you don't install Mroonga yet, see Install. It describes how to install Mroonga.Try the following tutorials after you install Mroonga. You will understand how to use Mroonga. Installation checkIt is better that you check Mroonga installation before you use Mroonga. If Mroonga installation is failed, the SQLs in this tutorial will fail.The way to start or stop MySQL server is just same as the normal MySQL. After invoking the MySQL server, connect to it by mysql command. If you set password, you need to add '-p' option. % mysql -uroot test By using SHOW ENGINES command, you can check if Mroonga is installed: SHOW ENGINES; -- +------------+---------+------------------------------------------------------------+--------------+------+------------+ -- | Engine | Support | Comment | Transactions | XA | Savepoints | -- +------------+---------+------------------------------------------------------------+--------------+------+------------+ -- | Mroonga | YES | Fulltext search, column base | NO | NO | NO | -- | MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | -- | CSV | YES | CSV storage engine | NO | NO | NO | -- | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO | -- | InnoDB | YES | Supports transactions, row-level locking, and foreign keys | YES | YES | YES | -- | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | -- +------------+---------+------------------------------------------------------------+--------------+------+------------+ -- 6 rows in set (0.00 sec) If you see Mroonga storage engine like the above, the installation is well done. If Mroonga isn't shown, run the following command. If you don't install Mroonga by package, the path of install.sql may be different: % mysql -uroot test < /usr/share/mroonga/install.sql If you can't find install.sql, run INSTALL PLUGIN command like the following manually: INSTALL PLUGIN Mroonga SONAME 'ha_mroonga.so'; Then, check server variable whether correct Mroonga version is installed by SHOW VARIABLES command. SHOW VARIABLES LIKE 'mroonga_version'; ModeMroonga has the following two modes.
With the storage mode, we use Groonga for both of the full text search function and the data storage. Since all functions of storage engine are realised with Groonga, aggregations are fast, that is one of Groonga's advantages, and you can manage the database directly by groonga command. The structure of the storage mode is the following. You use it instead of existing storage engines like MyISAM or InnoDB [image: storage mode] [image]
With the wrapper mode, Groonga is used for full text search function only, and another existing storage engine like InnoDB is used for storing data. By using wrapper mode, you combine InnoDB that is well-use as the storage engine and Mroonga that is a proven full text search engine, and you can use it as the stable database having the fast full text search function. The structure of the wrapper mode is the following. Full text search related operations are done by Mroonga, and other operations are done by existing storage engines like MyISAM, InnoDB etc. Mroonga is located between SQL Handler that processes SQL and an existing storage engine, thus all data goes through Mroonga With this way, full text search indexing etc. are done transparently. [image: wrapper mode] [image]
Storage modeHere we explain how to use storage mode of MroongaHow to use full text searchAfter confirming the installation, let's create a table. The important point is to specify Mroonga by ENGINE = Mroonga:CREATE TABLE diaries ( id INT PRIMARY KEY AUTO_INCREMENT, content VARCHAR(255), FULLTEXT INDEX (content) ) ENGINE = Mroonga DEFAULT CHARSET utf8; -- Query OK, 0 rows affected (0.10 sec) We put data by INSERT: INSERT INTO diaries (content) VALUES ("It'll be fine tomorrow."); -- Query OK, 1 row affected (0.01 sec) INSERT INTO diaries (content) VALUES ("It'll rain tomorrow"); -- Query OK, 1 row affected (0.00 sec) Try full text search: SELECT * FROM diaries WHERE MATCH(content) AGAINST("+fine" IN BOOLEAN MODE); -- +----+-----------------------------------------+ -- | id | content | -- +----+-----------------------------------------+ -- | 1 | It'll be fine tomorrow. | -- +----+-----------------------------------------+ -- 1 row in set (0.00 sec) Yes, full text search works. How to get search scoreNOTE:In version 1.0.0 or before, Mroonga used a special column
named _score to get search score. From version 1.0.0, it follows
MySQL's standard way to get search score.
We often want to display more relevant results first in full text search. We use search score in such case. We can get search score by MySQL's standard way [1], i.e. we use MATCH...AGAINST in one of columns in SELECT or ORDER BY. Let's try: INSERT INTO diaries (content) VALUES ("It's fine today. It'll be fine tomorrow as well."); -- Query OK, 1 row affected (0.00 sec) INSERT INTO diaries (content) VALUES ("It's fine today. But it'll rain tomorrow."); -- Query OK, 1 row affected (0.00 sec) SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC; -- +----+--------------------------------------------------+---------------------------------------------------+ -- | id | content | MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) | -- +----+--------------------------------------------------+---------------------------------------------------+ -- | 3 | It's fine today. It'll be fine tomorrow as well. | 2 | -- | 1 | It'll be fine tomorrow. | 1 | -- | 4 | It's fine today. But it'll rain tomorrow. | 1 | -- +----+--------------------------------------------------+---------------------------------------------------+ -- 3 rows in set (0.00 sec) The result having the search word fine more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase. You can use AS to change the attribute name: SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) AS score FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC; -- +----+--------------------------------------------------+-------+ -- | id | content | score | -- +----+--------------------------------------------------+-------+ -- | 3 | It's fine today. It'll be fine tomorrow as well. | 2 | -- | 1 | It'll be fine tomorrow. | 1 | -- | 4 | It's fine today. But it'll rain tomorrow. | 1 | -- +----+--------------------------------------------------+-------+ -- 3 rows in set (0.00 sec) How to specify the parser for full text searchMySQL has the following syntax to specify the parser [2] for full text search:FULLTEXT INDEX (content) WITH PARSER parser_name To use this syntax, you need to register all parsers in MySQL beforehand. On the other hand, Groonga can dynamically add a tokenizer, that is a parser in MySQL. So if use this syntax in Mroonga, tokenizers that are added in Groonga dynamically cannot be supported. We think that this limitation decreases the convenience, and we choose our own syntax using COMMENT like the following: FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"' NOTE: COMMENT in FULLTEXT INDEX is only supported
MySQL 5.5 or later. If you use MySQL 5.1, use mroonga_default_tokenizer
variable described below.
You can specify one of the following values as the tokenizer. "tokenizer" in Mroonga equals to "parser" in MySQL. Available tokenizers
You can specify the default parser by passing --with-default-tokenizer option in configure when you build Mroonga: ./configure --with-default-tokenizer TokenMecab ... Or you can set mroonga_default_tokenizer variable in my.cnf or by SQL. If you specify it in my.cnf, the change will not be lost after restarting MySQL, but you need to restart MySQL to make it effective. On the other hand, if you set it in SQL, the change is effective immediately, but it will be lost when you restart MySQL. my.cnf: [mysqld] mroonga_default_tokenizer=TokenMecab SQL: SET GLOBAL mroonga_default_tokenizer = TokenMecab; -- Query OK, 0 rows affected (0.00 sec) How to specify the normalizerMroonga uses normalizer corresponding to the encoding of document. It is used when tokenizing text and storing table key.It is used NormalizerMySQLGeneralCI normalizer when the encoding is utf8_general_ci or utf8mb4_general_ci. It is used NormalizerMySQLUnicodeCI normalizer when the encoding is utf8_unicode_ci or utf8mb4_unicode_ci. It isn't used normalizer when the encoding is utf8_bin. Here is an example that uses NormalizerMySQLUnicodeCI normalizer by specifying utf8_unicode_ci: SET NAMES utf8; -- Query OK, 0 rows affected (0.00 sec) CREATE TABLE diaries ( day DATE PRIMARY KEY, content VARCHAR(64) NOT NULL, FULLTEXT INDEX (content) ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; -- Query OK, 0 rows affected (0.18 sec) INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。"); -- Query OK, 1 row affected (0.00 sec) SELECT * FROM diaries WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE); -- +------------+-----------------------------------------+ -- | day | content | -- +------------+-----------------------------------------+ -- | 2013-04-23 | ブラックコーヒーを飲んだ。 | -- +------------+-----------------------------------------+ -- 1 row in set (0.00 sec) SELECT * FROM diaries WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE); -- +------------+-----------------------------------------+ -- | day | content | -- +------------+-----------------------------------------+ -- | 2013-04-23 | ブラックコーヒーを飲んだ。 | -- +------------+-----------------------------------------+ -- 1 row in set (0.00 sec) Mroonga has the following syntax to specify Groonga's normalizer: FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"' See Groonga's documentation about normalizer for more details. Here is an example that uses NormalizerAuto normalizer: SET NAMES utf8; -- Query OK, 0 rows affected (0.00 sec) CREATE TABLE diaries ( day DATE PRIMARY KEY, content VARCHAR(64) NOT NULL, FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"' ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; -- Query OK, 0 rows affected (0.19 sec) INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。"); -- Query OK, 1 row affected (0.00 sec) SELECT * FROM diaries WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE); -- Empty set (0.00 sec) SELECT * FROM diaries WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE); -- +------------+-----------------------------------------+ -- | day | content | -- +------------+-----------------------------------------+ -- | 2013-04-23 | ブラックコーヒーを飲んだ。 | -- +------------+-----------------------------------------+ -- 1 row in set (0.00 sec) How to specify the token filtersMroonga has the following syntax to specify Groonga's token filters.:FULLTEXT INDEX (content) COMMENT 'token_filters "TokenFilterStem"' Here is an example that uses TokenFilterStem token filter: SELECT mroonga_command('register token_filters/stem'); -- +------------------------------------------------+ -- | mroonga_command('register token_filters/stem') | -- +------------------------------------------------+ -- | true | -- +------------------------------------------------+ -- 1 row in set (0.00 sec) CREATE TABLE memos ( id INT NOT NULL PRIMARY KEY, content TEXT NOT NULL, FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto", token_filters "TokenFilterStem"' ) Engine=Mroonga DEFAULT CHARSET=utf8; -- Query OK, 0 rows affected (0.18 sec) INSERT INTO memos VALUES (1, "I develop Groonga"); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (2, "I'm developing Groonga"); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (3, "I developed Groonga"); -- Query OK, 1 row affected (0.00 sec) SELECT * FROM memos WHERE MATCH (content) AGAINST ("+develops" IN BOOLEAN MODE); -- +----+------------------------+ -- | id | content | -- +----+------------------------+ -- | 1 | I develop Groonga | -- | 2 | I'm developing Groonga | -- | 3 | I developed Groonga | -- +----+------------------------+ -- 3 rows in set (0.01 sec) See Groonga's documentation about token filter documentation for more details. Here is an example that uses TokenFilterStopWord token filter: SELECT mroonga_command("register token_filters/stop_word"); -- +-----------------------------------------------------+ -- | mroonga_command("register token_filters/stop_word") | -- +-----------------------------------------------------+ -- | true | -- +-----------------------------------------------------+ -- 1 row in set (0.00 sec) CREATE TABLE terms ( term VARCHAR(64) NOT NULL PRIMARY KEY, is_stop_word BOOL NOT NULL ) Engine=Mroonga COMMENT='tokenizer "TokenBigram", token_filters "TokenFilterStopWord"' DEFAULT CHARSET=utf8; -- Query OK, 0 rows affected (0.12 sec) CREATE TABLE memos ( id INT NOT NULL PRIMARY KEY, content TEXT NOT NULL, FULLTEXT INDEX (content) COMMENT 'table "terms"' ) Engine=Mroonga DEFAULT CHARSET=utf8; -- Query OK, 0 rows affected (0.17 sec) INSERT INTO terms VALUES ("and", true); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (1, "Hello"); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (2, "Hello and Good-bye"); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (3, "Good-bye"); -- Query OK, 1 row affected (0.00 sec) SELECT * FROM memos WHERE MATCH (content) AGAINST ('+"Hello and"' IN BOOLEAN MODE); -- +----+--------------------+ -- | id | content | -- +----+--------------------+ -- | 1 | Hello | -- | 2 | Hello and Good-bye | -- +----+--------------------+ -- 2 rows in set (0.01 sec) It's used that specifying the lexicon table for fulltext search. How to specify Groonga's column flagsMroonga has the following syntax to specify Groonga's column flags:content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"' Here is an example that uses COMPRESS_ZLIB flag: CREATE TABLE entries ( id INT UNSIGNED PRIMARY KEY, content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"' ) Engine=Mroonga DEFAULT CHARSET=utf8; -- Query OK, 0 rows affected (0.12 sec) See Groonga's documentation about column flags for more details. How to use geolocation searchIn storage mode, you can use fast geolocation search in addition to full text search. But unlike MyISAM, you can only store POINT type data. You cannot store other types data like LINE. And fast search using index only supports MBRContains. It does not support MBRDisjoint.For the table definition for geolocation search, you need to define a POINT type column like in MyISAM and define SPATIAL INDEX for it.: CREATE TABLE shops ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(255), location POINT NOT NULL, SPATIAL INDEX (location) ) ENGINE = Mroonga; -- Query OK, 0 rows affected (0.06 sec) To store data, you create POINT type data by using geomFromText() function like in MyISAM: INSERT INTO shops VALUES (null, 'Nezu''s Taiyaki', GeomFromText('POINT(139.762573 35.720253)')); -- Query OK, 1 row affected (0.00 sec) INSERT INTO shops VALUES (null, 'Naniwaya', GeomFromText('POINT(139.796234 35.730061)')); -- Query OK, 1 row affected (0.00 sec) INSERT INTO shops VALUES (null, 'Yanagiya Taiyaki', GeomFromText('POINT(139.783981 35.685341)')); -- Query OK, 1 row affected (0.00 sec) If you want to find shops within the rectangle where Ikebukuro station (139.7101 35.7292) is the top-left point and Tokyo Station (139.7662 35.6815) is the bottom-right point, SELECT phrase is like the following: SELECT id, name, AsText(location) FROM shops WHERE MBRContains(GeomFromText('LineString(139.7101 35.7292, 139.7662 35.6815)'), location); -- +----+-----------------------+------------------------------------------+ -- | id | name | AsText(location) | -- +----+-----------------------+------------------------------------------+ -- | 1 | Nezu's Taiyaki | POINT(139.762572777778 35.7202527777778) | -- +----+-----------------------+------------------------------------------+ -- 1 row in set (0.00 sec) Here you can search by geolocation! How to get the record IDGroonga assigns a unique number to identify the record when a record is added in the table.To make the development of applications easier, you can get this record ID by SQL in Mroonga To get the record ID, you need to create a column named _id when you create a table: CREATE TABLE memos ( _id INT, content VARCHAR(255), UNIQUE KEY (_id) USING HASH ) ENGINE = Mroonga; -- Query OK, 0 rows affected (0.04 sec) Tye typo of _id column should be integer one (TINYINT, SMALLINT, MEDIUMINT, INT or BIGINT). You can create an index for _id column, but it should be HASH type. Let's add records in the table by INSERT. Since _id column is implemented as a virtual column and its value is assigned by Groonga, you cannot specify the value when updating. So you need to exclude it from setting columns, or you need to use null as its value: INSERT INTO memos VALUES (null, "Saury for today's dinner."); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (null, "Update mroonga tomorrow."); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (null, "Buy some dumpling on the way home."); -- Query OK, 1 row affected (0.00 sec) INSERT INTO memos VALUES (null, "Thank God It's meat day."); -- Query OK, 1 row affected (0.00 sec) To get the record ID, you invoke SELECT with _id column: SELECT * FROM memos; -- +------+------------------------------------------+ -- | _id | content | -- +------+------------------------------------------+ -- | 1 | Saury for today's dinner. | -- | 2 | Update mroonga tomorrow. | -- | 3 | Buy some dumpling on the way home. | -- | 4 | Thank God It's meat day. | -- +------+------------------------------------------+ -- 4 rows in set (0.00 sec) By using last_insert_grn_id function, you can also get the record ID that is assigned by the last INSERT: INSERT INTO memos VALUES (null, "Just one bottle of milk in the fridge."); -- Query OK, 1 row affected (0.00 sec) SELECT last_insert_grn_id(); -- +----------------------+ -- | last_insert_grn_id() | -- +----------------------+ -- | 5 | -- +----------------------+ -- 1 row in set (0.00 sec) last_insert_grn_id function is included in Mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function: CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so'; As you can see in the example above, you can get the record ID by _id column or last_insert_grn_id function. It will be useful to use this value in the ensuing SQL queries like UPDATE: UPDATE memos SET content = "So much milk in the fridge." WHERE _id = last_insert_grn_id(); -- Query OK, 1 row affected (0.00 sec) -- Rows matched: 1 Changed: 1 Warnings: 0 How to get snippet (Keyword in context)Mroonga provides functionality to get keyword in context. It is implemented as mroonga_snippet() UDF.How to use similar searchSimilar search is supported by specifying document itself instead of specifying keywords in query.It is useful if you want to find documents which is related to specific document. Here is the schema definition for execution examples: CREATE TABLE similarities ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(32), content VARCHAR(255), FULLTEXT INDEX (content) ) ENGINE = Mroonga DEFAULT CHARSET utf8; Here is the sample data for execution examples: INSERT INTO similarities (title, content) VALUES ('Groonga similar search', 'Groonga is an open-source fulltext search engine and column store.'); INSERT INTO similarities (title, content) VALUES ('Mroonga similar search', 'Mroonga is an open-source storage engine for fast fulltext search with MySQL.'); INSERT INTO similarities (title, content) VALUES ('Rroonga library', 'A library to use Groonga features from Ruby.'); Here is the example of similar search by content itself. SELECT title FROM similarities WHERE MATCH(content) AGAINST ('There are many open-source fulltext search engine.' IN NATURAL LANGUAGE MODE); Note that you need to specify an content of document in AGAINST('...' IN NATURAL LANGUAGE MODE). SELECT title FROM similarities WHERE MATCH(content) AGAINST ('There are many open-source fulltext search engine.' IN NATURAL LANGUAGE MODE); -- +------------------------+ -- | title | -- +------------------------+ -- | Groonga similar search | -- | Mroonga similar search | -- +------------------------+ -- 2 rows in set (0.00 sec) To improve degree of similarity, you may need to use language specific tokenizer or use labeled data such as tag to get rid of some undesirable search results. For example, if you want to execute similar search against Japanese text, it is recommended to use language specific tokenizer - TokenMecab. Here is the schema definition to use TokenMecab tokenizer: CREATE TABLE similarities ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(32), content VARCHAR(255), FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"' ) ENGINE = Mroonga DEFAULT CHARSET utf8; How to run Groonga commandIn storage mode, Mroonga stores all your data into Groonga database. You can access Groonga database by SQL with Mroonga. SQL is very powerful but it is not good for some operations such as faceted search.Faceted search is popular recently. Many online shopping sites such as amazon.com and ebay.com support faceted search. Faceted search refines the current search by available search parameters before users refine their search. And faceted search shows refined searches. Users just select a refined search. Users benefit from faceted search:
Faceted search needs multiple GROUP BY operations against searched result set. To do faceted search by SQL, multiple SELECT requests are needed. It is not effective. Groonga can do faceted search by only one groonga command. It is effective. Groonga has the select command that can search records with faceted search. Faceted search is called as "drilldown" in Groonga. See Groonga's documentation about select command for more details. Mroonga provides mroonga_command() function. You can run Groonga command in SQL by the function. But you should use only select command. Other commands that change schema or data may break consistency. Here is the schema definition for execution examples: CREATE TABLE diaries ( id INT PRIMARY KEY AUTO_INCREMENT, content VARCHAR(255), date DATE, year YEAR, `year_month` VARCHAR(9), tag VARCHAR(32), FULLTEXT INDEX (content) ) ENGINE = Mroonga DEFAULT CHARSET utf8; Here is the sample data for execution examples: INSERT INTO diaries (content, date, year, `year_month`, tag) VALUES ('Groonga is an open-source fulltext search engine and column store.', '2013-04-08', '2013', '2013-04', 'groonga'); INSERT INTO diaries (content, date, year, `year_month`, tag) VALUES ('Mroonga is an open-source storage engine for fast fulltext search with MySQL.', '2013-04-09', '2013', '2013-04', 'MySQL'); INSERT INTO diaries (content, date, year, `year_month`, tag) VALUES ('Tritonn is a patched version of MySQL that supports better fulltext search function with Senna.', '2013-03-29', '2013', '2013-03', 'MySQL'); Each record has groonga or MySQL as tag. Each record also has year and year_month. You can use tag, year and year_month as faceted search keys. Groonga calls faceted search as drilldown. So parameter key in Groonga is --drilldown. Groonga returns search result as JSON. So mroonga_command() also returns search result as JSON. It is not SQL friendly. You need to parse search result JSON by yourself. Here is the example of faceted search by all available faceted search keys (result JSON is pretty printed): SELECT mroonga_command("select diaries --output_columns _id --limit 0 --drilldown tag,year,year_month") AS faceted_result; -- +-----------------------------+ -- | faceted_result | -- +-----------------------------+ -- | [[[3], | -- | [["_id","UInt32"]]], | -- | [[2], | -- | [["_key","ShortText"], | -- | ["_nsubrecs","Int32"]], | -- | ["groonga",1], | -- | ["MySQL",2]], | -- | [[1], | -- | [["_key","Time"], | -- | ["_nsubrecs","Int32"]], | -- | [1356998400.0,3]], | -- | [[2], | -- | [["_key","ShortText"], | -- | ["_nsubrecs","Int32"]], | -- | ["2013-04",2], | -- | ["2013-03",1]]] | -- +-----------------------------+ -- 1 row in set (0.00 sec) The first element [[3], [["_id","UInt32"]]] is normal search result. It's not faceted search result. The second, third and forth elements are faceted search results: [[2], [["_key","ShortText"], ["_nsubrecs","Int32"]], ["groonga",1], ["MySQL",2]] [[1], [["_key","Time"], ["_nsubrecs","Int32"]], [1356998400.0,3]] [[2], [["_key","ShortText"], ["_nsubrecs","Int32"]], ["2013-04",2], ["2013-03",1]] The order of faceted search results is corresponding to the value of --drilldown. In this example, we specified tag, year and year_month as --drilldown value. So the first faceted search result is for tag, the second one is for year and the third one is for year_month. Each faceted search result uses the following format. This is the same as normal search result: [[${THE_NUMBER_OF_RECORDS}], [[${OUTPUT_COLUMN_NAME_0}, ${OUTPUT_COLUMN_TYPE_0}], [${OUTPUT_COLUMN_NAME_1}, ${OUTPUT_COLUMN_TYPE_1}], ..., [${OUTPUT_COLUMN_NAME_N}, ${OUTPUT_COLUMN_TYPE_N}]] [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_0}, ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_0}, ..., ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_0}], [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_1}, ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_1}, ..., ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_1}], ... [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_M}, ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_M}, ..., ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_M}]] The _key column value in faceted search result shows faceted search key. For example, the first faceted search result (the faceted search result for tag) shows that matched records (all records in this case) have groonga and MySQL as tag value: [[2], [["_key","ShortText"], ["_nsubrecs","Int32"]], ["groonga",1], ["MySQL",2]] The _nsubrecs column value in faceted search result shows the number of records that have the corresponding faceted search key. For example, the first faceted search result (the faceted search result for tag) shows that there are 1 record that has groonga as tag value and 2 records that have MySQL as tag value: [[2], [["_key","ShortText"], ["_nsubrecs","Int32"]], ["groonga",1], ["MySQL",2]] See Groonga's documentation about select command for more details. How to search by regular expressionIn storage mode, you can use a Groonga's functionality from Mroonga as described above. Thus, you can search records by using a regular expression via Groonga's functionality.There are some conditions to use regular expression in Mroonga.
Here is the example of search by regular expression. CREATE TABLE paths ( content text, FULLTEXT INDEX content_index (content) COMMENT 'tokenizer "TokenRegexp", normalizer "NormalizerAuto"' ) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4; INSERT INTO paths VALUES ('/usr/bin/groonga'); INSERT INTO paths VALUES ('/var/log/auth.log'); INSERT INTO paths VALUES ('/var/log/messages'); INSERT INTO paths VALUES ('/tmp/local/var/log/auth.log'); SELECT * FROM paths WHERE MATCH(content) AGAINST ('*SS content @~ "\\\\A/var/log/auth"' IN BOOLEAN MODE); -- +-------------------+ -- | content | -- +-------------------+ -- | /var/log/auth.log | -- +-------------------+ -- 1 row in set (0.024 sec) By using *SS pragma, you can search the records which matches /var/log/auth.log with content @~ "\\\\A/var/log/auth". @~ is a Groonga's operator which executes a regular expression search, and "\\\\A/var/log/auth" executes prefix search, so it matches to only /var/log/auth.log. /tmp/local/var/log/auth.log doesn't match because it doesn't begin with "/var/log/auth". See Groonga's regular expression document for more syntax details. LoggingMroonga outputs the logs by default.Log files are located in MySQL's data directory with the filename groonga.log. Here is the example of the log. 2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started. 2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test) 2010-10-07 17:32:44.936113|d|46953940|hash put (key=test) The default log level is NOTICE, i.e. we have important information only and we don't have debug information etc.). You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase: SHOW VARIABLES LIKE 'mroonga_log_level'; -- +-------------------+--------+ -- | Variable_name | Value | -- +-------------------+--------+ -- | mroonga_log_level | NOTICE | -- +-------------------+--------+ -- 1 row in set (0.00 sec) SET GLOBAL mroonga_log_level=DUMP; -- Query OK, 0 rows affected (0.00 sec) SHOW VARIABLES LIKE 'mroonga_log_level'; -- +-------------------+-------+ -- | Variable_name | Value | -- +-------------------+-------+ -- | mroonga_log_level | DUMP | -- +-------------------+-------+ -- 1 row in set (0.00 sec) Available log levels are the followings.
See mroonga_log_level about details. You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.
Next stepNow, you can use Mroonga as storage mode! If you want Mroonga to be faster, see also Optimizations.FOOTNOTES
Wrapper modeHere we explain how to use wrapper mode of MroongaHow to use wrapper modeIn wrapper mode, Mroonga works in wrapping an existing storage engine. To specify the wrapped storage engine, we use SQL comment like COMMENT = 'engine "InnoDB"' for now.NOTE: For now, a primary key is mandatory in wrapper mode. That
is not the case with storage mode.
NOTE: Wrapper mode supports the followings, that are not
supported in storage mode for now.
How to use full text searchAfter confirming the installation, let's create a table. The important point is to specify Mroonga by ENGINE = Mroonga.mysql> CREATE TABLE diaries ( -> id INT PRIMARY KEY AUTO_INCREMENT, -> content VARCHAR(255), -> FULLTEXT INDEX (content) -> ) ENGINE = Mroonga COMMENT = 'engine "InnoDB"' DEFAULT CHARSET utf8; Query OK, 0 rows affected (0.52 sec) We put data by INSERT. mysql> INSERT INTO diaries (content) VALUES ("It'll be fine tomorrow."); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO diaries (content) VALUES ("It'll rain tomorrow"); Query OK, 1 row affected (0.00 sec) Try full text search. mysql> SELECT * FROM diaries WHERE MATCH(content) AGAINST("+fine" IN BOOLEAN MODE); +----+-----------------------------------------+ | id | content | +----+-----------------------------------------+ | 1 | It'll be fine tomorrow. | +----+-----------------------------------------+ 1 row in set (0.00 sec) Yes, full text search works. How to get search scoreWe often want to display more relevant results first in full text search. We use search score in such case.We can get search score by MySQL's standard way [1], i.e. we use MATCH...AGAINST in one of columns in SELECT or ORDER BY. Let's try. mysql> INSERT INTO diaries (content) VALUES ("It's fine today. It'll be fine tomorrow as well."); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO diaries (content) VALUES ("It's fine today. But it'll rain tomorrow."); Query OK, 1 row affected (0.00 sec) mysql> SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC; +----+--------------------------------------------------+---------------------------------------------------+ | id | content | MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) | +----+--------------------------------------------------+---------------------------------------------------+ | 3 | It's fine today. It'll be fine tomorrow as well. | 2 | | 1 | It'll be fine tomorrow. | 1 | | 4 | It's fine today. But it'll rain tomorrow. | 1 | +----+--------------------------------------------------+---------------------------------------------------+ 3 rows in set (0.00 sec) The result having the search word fine more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase. You can use AS to change the attribute name. mysql> SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) AS score FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC; +----+--------------------------------------------------+-------+ | id | content | score | +----+--------------------------------------------------+-------+ | 3 | It's fine today. It'll be fine tomorrow as well. | 2 | | 1 | It'll be fine tomorrow. | 1 | | 4 | It's fine today. But it'll rain tomorrow. | 1 | +----+--------------------------------------------------+-------+ 3 rows in set (0.00 sec) How to specify the parser for full text searchMySQL has the following syntax to specify the parser [2] for full text search.FULLTEXT INDEX (content) WITH PARSER parser_name To use this syntax, you need to register all parsers in MySQL beforehand. On the other hand, Groonga can dynamically add a tokenizer, that is a parser in MySQL. So if use this syntax in Mroonga, tokenizers that are added in Groonga dynamically cannot be supported. We think that this limitation decreases the convenience, and we choose our own syntax using COMMENT like the following. FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"' NOTE: COMMENT in FULLTEXT INDEX is only supported
MySQL 5.5 or later. If you use MySQL 5.1, use mroonga_default_parser
variable described below.
You can specify one of following values as the tokenizer. Available tokenizers
You can specify the default tokenizer by passing --with-default-tokenizer option in configure when you build Mroonga. ./configure --with-default-tokenizer TokenMecab ... Or you can set mroonga_default_tokenizer variable in my.cnf or by SQL. If you specify it in my.cnf, the change will not be lost after restarting MySQL, but you need to restart MySQL to make it effective. On the other hand, if you set it in SQL, the change is effective immediately, but it will be lost when you restart MySQL. my.cnf: [mysqld] mroonga_default_tokenizer=TokenMecab SQL: mysql> SET GLOBAL mroonga_default_tokenizer = TokenMecab; Query OK, 0 rows affected (0.00 sec) How to specify the normalizerMroonga uses normalizer corresponding to the encoding of document. It is used when tokenizing text and storing table key.It is used NormalizerMySQLGeneralCI normalizer when the encoding is utf8_general_ci or utf8mb4_general_ci. It is used NormalizerMySQLUnicodeCI normalizer when the encoding is utf8_unicode_ci or utf8mb4_unicode_ci. It isn't used normalizer when the encoding is utf8_bin. Here is an example that uses NormalizerMySQLUnicodeCI normalizer by specifying utf8_unicode_ci.: mysql> SET NAMES utf8; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE diaries ( -> day DATE PRIMARY KEY, -> content VARCHAR(64) NOT NULL, -> FULLTEXT INDEX (content) -> ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Query OK, 0 rows affected (0.18 sec) mysql> INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。"); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM diaries -> WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE); +------------+-----------------------------------------+ | day | content | +------------+-----------------------------------------+ | 2013-04-23 | ブラックコーヒーを飲んだ。 | +------------+-----------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM diaries -> WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE); +------------+-----------------------------------------+ | day | content | +------------+-----------------------------------------+ | 2013-04-23 | ブラックコーヒーを飲んだ。 | +------------+-----------------------------------------+ 1 row in set (0.00 sec) Mroonga has the following syntax to specify Groonga's normalizer: FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"' See Groonga's documentation about normalizer for more details. Here is an example that uses NormalizerAuto normalizer: mysql> SET NAMES utf8; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE diaries ( -> day DATE PRIMARY KEY, -> content VARCHAR(64) NOT NULL, -> FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"' -> ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Query OK, 0 rows affected (0.19 sec) mysql> INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。"); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM diaries -> WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE); Empty set (0.00 sec) mysql> SELECT * FROM diaries -> WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE); +------------+-----------------------------------------+ | day | content | +------------+-----------------------------------------+ | 2013-04-23 | ブラックコーヒーを飲んだ。 | +------------+-----------------------------------------+ 1 row in set (0.00 sec) How to get snippet (Keyword in context)Mroonga provides functionality to get keyword in context. It is implemented as mroonga_snippet() UDF.LoggingMroonga outputs the logs by default.Log files are located in MySQL's data directory with the filename groonga.log. Here is the example of the log. 2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started. 2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test) 2010-10-07 17:32:44.936113|d|46953940|hash put (key=test) The default log level is NOTICE, i.e. we have important information only and we don't have debug information etc.). You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase. mysql> SHOW VARIABLES LIKE 'mroonga_log_level'; +-------------------+--------+ | Variable_name | Value | +-------------------+--------+ | mroonga_log_level | NOTICE | +-------------------+--------+ 1 row in set (0.00 sec) mysql> SET GLOBAL mroonga_log_level=DUMP; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_log_level'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | mroonga_log_level | DUMP | +-------------------+-------+ 1 row in set (0.00 sec) Available log levels are the followings.
See mroonga_log_level about details. You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.
Next stepNow, you can use Mroonga as wrapper mode! If you want Mroonga to be faster, see also Optimizations.FOOTNOTES
REFERENCEOptimizationsMroonga implemented some optimizations to return response faster.Some optimizations can work only on Storage mode. Fetching only needed columnsThis optimization can work only on Storage mode.Groonga uses column store architecture. It means that Groonga doesn't need to fetch values of all columns for fetching a column value in a row. Groonga can fetch values of only needed columns. InnoDB and MyISAM use row store architecture. They need to fetch values of all columns for fetching a column value in a row. If you specify only columns you needed in SELECT, Mroonga just fetches only values of these columns. Mroonga doesn't fetch values of other columns. Mroonga can work faster by reducing operations and I/O. It's this optimization. Here is a sample table definition to describe this optimization: CREATE TABLE t1 ( c1 INT PRIMARY KEY AUTO_INCREMENT, c2 INT, c3 INT, ... c11 VARCHAR(20), c12 VARCHAR(20), ... c20 DATETIME ) ENGINE=Mroonga DEFAULT CHARSET=utf8; Here is a SELECT to describe this optimization: SELECT c1, c2, c11 FROM t1 WHERE c2 = XX AND c12 = "XXX"; In this case, Mroonga fetches values from only c1, c2, c11 and c12. Mroonga doesn't fetch values from c3, c4, ..., c10, c13, ..., c19 and c20. Row countThis optimization can work only on Storage mode.MySQL requires all column values from storage engine for processing COUNT(*) even if COUNT(*) doesn't need them. Mroonga doesn't fetch any column values for the case. Mroonga can work faster by reducing operations and I/O. It's this optimization. Here is a SELECT to describe this optimization: SELECT COUNT(*) FROM t1 WHERE MATCH(c2) AGAINST("+keyword" IN BOOLEAN MODE); The SELECT fetches only COUNT(*) and condition in WHERE can be processed only by index. In this case, Mroonga uses this optimization. You can confirm whether this optimization is used or not by looking Mroonga_count_skip status variable: mysql> SHOW STATUS LIKE 'Mroonga_count_skip'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | Mroonga_count_skip | 1 | +--------------------+-------+ 1 row in set (0.00 sec) Mroonga_count_skip status variable is incremented when Mroonga uses this optimization. You can disable this optimization by setting mroonga_enable_optimization to false. ORDER BY LIMITThis optimization can work on both Storage mode and Wrapper mode.MySQL can process ORDER BY and LIMIT with low cost if you can get sorted records by index even if the number of matched records is very big. MySQL can do the process for MATCH() AGAINST(IN NATURAL LANGUAGE MODE). But MySQL can't do the process for MATCH() AGAINST(IN BOOLEAN MODE). It means that MySQL might take long time for MATCH() AGAINST(IN BOOLEAN MODE) that matches with many records. Mroonga processes ORDER BY and LIMIT by Groonga and returns only target records to MySQL. It's very faster for query that matches with many records. It's this optimization. Here is a SELECT to describe this optimization: SELECT * FROM t1 WHERE MATCH(c2) AGAINST("+keyword" IN BOOLEAN MODE) ORDER BY c1 LIMIT 1; The SELECT runs full text search and sorts by Groonga and returns only one record to MySQL. You can confirm whether this optimization is used or not by looking Mroonga_fast_order_limit status variable: mysql> SHOW STATUS LIKE 'Mroonga_fast_order_limit'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Mroonga_fast_order_limit | 1 | +--------------------------+-------+ 1 row in set (0.00 sec) Mroonga_fast_order_limit status variable is incremented when Mroonga uses this optimization. This optimization is used only when all the following conditions are true:
SQL commandsThis section describes avaiable SQL commands and unavailable SQL commands.List of available SQL commands
List of unavailable SQL commands
Basically, The character set which Groonga supports (EUC-JP/UTF-8/SJIS/LATIN1/KOI8R) is also supported in Mroonga. I found an unlisted SQL commandIf you find unlisted SQL command and confirm whether the SQL command works well or not, we want to list it. Please mail it to us or send a patch against the source file by pull request system on GitHub.Server variablesHere are the explanations of server variables that are introduced by Mroonga.mroonga_action_on_fulltext_query_errorThe default behavior of fulltext query error.The default value of mroonga_action_on_fulltext_query_error is ERROR_AND_LOG. This is the conventional behavior which is equal to old version of mroonga. Here is the list of mroonga_action_on_fulltext_query_error which you can use.
Here is an example SQL to confirm the value of mroonga_action_on_fulltext_query_error: mysql> SHOW VARIABLES LIKE 'mroonga_action_on_fulltext_query_error'; +----------------------------------------+---------------+ | Variable_name | Value | +----------------------------------------+---------------+ | mroonga_action_on_fulltext_query_error | ERROR_AND_LOG | +----------------------------------------+---------------+ 1 row in set (0.00 sec) mroonga_boolean_mode_syntax_flagsThe flags to custom syntax in MATCH () AGAINST ('...' IN BOOLEAN MODE).This variable is system and session variable. Here are available flags:
The default flags is DEFAULT. It is MySQL's BOOLEAN MODE compatible syntax. You can combine flags by separated by comma such as SYNTAX_QUERY,ALLOW_LEADING_NOT. Here is an example SQL to use script syntax in Groonga: mysql> SET mroonga_boolean_mode_syntax_flags = "SYNTAX_SCRIPT"; mroonga_database_path_prefixTODO:mroonga_default_parserDeprecated since version 5.04: Use mroonga_default_tokenizer instead.The default parser of the full text search. The default value can be specified by --with-default-parser=PARSER configure argument, whose default value is TokenBigram. Here is an example to use TokenBigramSplitSymbolAlphaDigit as a fulltext search parser. It is used by body_index fulltext index. SET GLOBAL mroonga_default_parser=TokenBigramSplitSymbolAlphaDigit; CREATE TABLE diaries ( id INT PRIMARY KEY AUTO_INCREMENT, body TEXT, FULLTEXT INDEX body_index (body) ) DEFAULT CHARSET UTF8; mroonga_default_tokenizerNew in version 5.04.The default tokenizer of the full text search. The default value can be specified by --with-default-tokenizer=TOKENIZER configure argument, whose default value is TokenBigram. Here is an example to use TokenBigramSplitSymbolAlphaDigit as a fulltext index tokenizer. It is used by body_index fulltext index. SET GLOBAL mroonga_default_tokenizer=TokenBigramSplitSymbolAlphaDigit; CREATE TABLE diaries ( id INT PRIMARY KEY AUTO_INCREMENT, body TEXT, FULLTEXT INDEX body_index (body) ) DEFAULT CHARSET UTF8; mroonga_default_wrapper_engineTODO:mroonga_dry_writeWhether really write data to Groonga database or not. The default value is OFF that means data are really written to Groonga database. Usually we don't need to change the value of this variable. This variable is useful for benchmark because we can measure processing time MySQL and Mroonga. It doesn't include Groonga's processing time.Here is an example SQL to disable writing data to Groonga database: mysql> SHOW VARIABLES LIKE 'mroonga_dry_write'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | mroonga_dry_write | OFF | +-------------------+-------+ 1 row in set (0.00 sec) mysql> SET mroonga_dry_write = true; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_dry_write'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | mroonga_dry_write | ON | +-------------------+-------+ 1 row in set (0.00 sec) mroonga_enable_optimizationWhether enable optimization or not. The default value is ON that means optimization is enabled. Usually we don't need to change the value of this variable. This variable is useful for benchmark.Here is an example SQL to disable optimization: mysql> SHOW VARIABLES LIKE 'mroonga_enable_optimization'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | mroonga_enable_optimization | ON | +-----------------------------+-------+ 1 row in set (0.00 sec) mysql> SET mroonga_enable_optimization = false; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_enable_optimization'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | mroonga_enable_optimization | OFF | +-----------------------------+-------+ 1 row in set (0.00 sec) mroonga_libgroonga_support_lz4The status of libgroonga supports LZ4.Here is an example SQL to confirm the status of libgroonga supports LZ4: mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_lz4'; +--------------------------------+-------+ | Variable_name | Value | +--------------------------------+-------+ | mroonga_libgroonga_support_lz4 | ON | +--------------------------------+-------+ mroonga_libgroonga_support_zlibThe status of libgroonga supports zlib.Here is an example SQL to confirm the status of libgroonga supports zlib: mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_zlib'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | mroonga_libgroonga_support_zlib | ON | +---------------------------------+-------+ mroonga_libgroonga_support_zstdThe status of libgroonga supports Zstandard.Here is an example SQL to confirm the status of libgroonga supports Zstandard: mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_zstd'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | mroonga_libgroonga_support_zstd | ON | +---------------------------------+-------+ mroonga_libgroonga_versionThe version string of the groonga library.Here is an example SQL to confirm the using groonga library version: mysql> SHOW VARIABLES LIKE 'mroonga_libgroonga_version'; +----------------------------+------------------+ | Variable_name | Value | +----------------------------+------------------+ | mroonga_libgroonga_version | 1.2.8-9-gbf05b82 | +----------------------------+------------------+ 1 row in set (0.00 sec) mroonga_lock_timeoutTODO:mroonga_log_fileThe path of the log file of Mroonga. The default value is groonga.log.Here is an example transcript to change log file to /tmp/mroonga.log: mysql> SHOW VARIABLES LIKE 'mroonga_log_file'; +------------------+-------------+ | Variable_name | Value | +------------------+-------------+ | mroonga_log_file | groonga.log | +------------------+-------------+ 1 row in set (0.00 sec) mysql> SET GLOBAL mroonga_log_file = "/tmp/mroonga.log"; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_log_file'; +------------------+------------------+ | Variable_name | Value | +------------------+------------------+ | mroonga_log_file | /tmp/mroonga.log | +------------------+------------------+ 1 row in set (0.00 sec) mroonga_log_levelThe output level of Mroonga log file. The default value is NOTICE.Here is the list of mroonga_log_level which you can use.
Here is an example transcript to change log level to DEBUG that logs many messages for debugging: mysql> SHOW VARIABLES LIKE 'mroonga_log_level'; +-------------------+--------+ | Variable_name | Value | +-------------------+--------+ | mroonga_log_level | NOTICE | +-------------------+--------+ 1 row in set (0.00 sec) mysql> SET GLOBAL mroonga_log_level = "debug"; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_log_level'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | mroonga_log_level | DEBUG | +-------------------+-------+ 1 row in set (0.00 sec) mroonga_query_log_fileThe path of the query log file of Mroonga. The default value is empty.If this value is empty, the query log is not stored to file. If this value is not empty, query log is stored to the specified file. Here is an example transcript to change query log file to /tmp/mroonga_query.log: mysql> SHOW VARIABLES LIKE 'mroonga_query_log_file'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | mroonga_query_log_file | | +------------------------+-------+ 1 row in set (0.00 sec) mysql> SET GLOBAL mroonga_query_log_file = "/tmp/mroonga.log"; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'mroonga_query_log_file'; +------------------------+------------------+ | Variable_name | Value | +------------------------+------------------+ | mroonga_query_log_file | /tmp/mroonga.log | +------------------------+------------------+ 1 row in set (0.00 sec) mroonga_match_escalation_thresholdThe threshold to determin whether match method is escalated. See search specification for Groonga about match method escalation.The default value is the same as Groonga's default value. It's 0 for the default installation. The dafault value can be configured in my.cnf or by SET GLOBAL mroonga_match_escalation_threshold = THRESHOLD;. Because this variable's scope is both global and session. Here is an example to use -1 as a threshold to determin whether match method is escalated. -1 means that never escalated. SET GLOBAL mroonga_match_escalation_threshold = -1; Here is an another example to show behavior change by the variable value. CREATE TABLE diaries ( id INT PRIMARY KEY AUTO_INCREMENT, title TEXT, tags TEXT, FULLTEXT INDEX tags_index (tags) COMMENT 'tokenizer "TokenDelimit"' ) ENGINE=mroonga DEFAULT CHARSET=UTF8; -- Test data INSERT INTO diaries (title, tags) VALUES ("Hello groonga!", "groonga install"); INSERT INTO diaries (title, tags) VALUES ("Hello mroonga!", "mroonga install"); -- Matches all records that have "install" tag. SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("install" IN BOOLEAN MODE); -- id title tags -- 1 Hello groonga! groonga install -- 2 Hello mroonga! mroonga install -- Matches no records by "gr" tag search because no "gr" tag is used. -- But matches a record that has "groonga" tag because search -- method is escalated and prefix search with "gr" is used. -- The default threshold is 0. It means that no records are matched then -- search method is escalated. SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE); -- id title tags -- 1 Hello groonga! groonga install -- Disables escalation. SET mroonga_match_escalation_threshold = -1; -- No records are matched. SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE); -- id title tags -- Enables escalation again. SET mroonga_match_escalation_threshold = 0; -- Matches a record by prefix search with "gr". SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE); -- id title tags -- 1 Hello groonga! groonga install mroonga_max_n_records_for_estimateNew in version 5.02.TODO: mroonga_enable_operations_recordingWhether recording operations for recover is enabled or not. The default value is OFF that means operations are not recorded to Groonga database. It needs to reopen the database with FLUSH TABLES in order to reflect the variable is changed.Here is an example SQL to disable operations recording: mysql> SET GLOBAL mroonga_enable_operations_recording = false; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH TABLES; Query OK, 0 rows affected (0.00 sec) mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_enable_operations_recording'; +-------------------------------------+-------+ | Variable_name | Value | +-------------------------------------+-------+ | mroonga_enable_operations_recording | OFF | +-------------------------------------+-------+ mroonga_vector_column_delimiterThe delimiter when outputting a vector column. The default value is a white space.Here is an example SQL to change the delimiter to a semicolon from a white space: mysql> SHOW VARIABLES LIKE 'mroonga_vector_column_delimiter'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | mroonga_vector_column_delimiter | | +---------------------------------+-------+ 1 row in set (0.00 sec) mysql> SET GLOBAL mroonga_vector_column_delimiter = ';'; Query OK, 0 rows affected (0.00 sec) mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_vector_column_delimiter'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | mroonga_vector_column_delimiter | ; | +---------------------------------+-------+ mroonga_versionThe version string of Mroonga.Here is an example SQL to confirm the running mroonga version: mysql> SHOW VARIABLES LIKE 'mroonga_version'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | mroonga_version | 1.10 | +-----------------+-------+ 1 row in set (0.00 sec) mroonga_condition_push_down_typeNew in version 7.10.It controls how to enable condition push down support. The default value is ONE_FULL_TEXT_SEARCH. It means that condition push down is enabled only when WHERE clause has one MATCH AGAINST condition. Here are available values:
Here is an example SQL to confirm the current value: mysql> SHOW VARIABLES LIKE 'mroonga_condition_push_down_type'; +----------------------------------+----------------------+ | Variable_name | Value | +----------------------------------+----------------------+ | mroonga_condition_push_down_type | ONE_FULL_TEXT_SEARCH | +----------------------------------+----------------------+ 1 row in set (0.00 sec) Status variablesHere are the explanations of status variables that are introduced by Mroonga.Mroonga_count_skipThis value is increased when 'fast line count feature' is used. You can use this value to check if the feature is working when you enable it.Here is an example how to check it: mysql> SHOW STATUS LIKE 'Mroonga_count_skip'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | Mroonga_count_skip | 0 | +--------------------+-------+ 1 row in set (0.00 sec) Mroonga_fast_order_limitThis value is increased when 'fast ORDER BY LIMIT feature' is used. You can use this value to check if the feature is working when you enable it.Here is an example how to check it: mysql> SHOW STATUS LIKE 'Mroonga_fast_order_limit'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Mroonga_fast_order_limit | 0 | +--------------------------+-------+ 1 row in set (0.00 sec) Mroonga_condition_push_downThis value is increased when condition push down is used. You can use this value to check whether condition push down is used or not.Here is an example how to check it: mysql> SHOW STATUS LIKE 'Mroonga_condition_push_down'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | Mroonga_condition_push_down | 1 | +-----------------------------+-------+ 1 row in set (0.00 sec) Mroonga_n_pooling_contextsThis value shows the number of pooling contexts for mroonga_command(). These contexts are reused from multiple mroonga_command() calls. So, it's not increased continually.Here is an example how to check it: mysql> SHOW STATUS LIKE 'Mroonga_n_pooling_contexts'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Mroonga_n_pooling_contexts | 0 | +----------------------------+-------+ 1 row in set (0.00 sec) UDF (User Defined Functions)Mroonga provides some useful functionality as UDF (User Defined Functions).mroonga_command()New in version 3.02.Summarymroonga_command() UDF executes the given string as a Groonga command and returns result of the Groonga command. Groonga command will be faster than MySQL query.mroonga_command() is an UDF for advanced users. Normally, you don't need to use this UDF. Syntaxmroonga_command() has only one required parameter:mroonga_command(command) command is a string value. It's a Groonga command to be executed. UsageTODOParametersRequired parametersThere is one required parameter, command.commandIt specifies a Groonga command to be executed.Return valueIt returns an evaluated result of the given Groonga command as a string.See alsoCommand in Groonga document.mroonga_escape()New in version 3.08.Summarymroonga_escape UDF provides functionality to escape given string. It also accepts parameter what character should be escaped.Syntaxmroonga_escape() has required parameter and optional parameter:mroonga_escape(string) mroonga_escape(string, special_characters) UsageHere is the example query which use special characters to be escaped:SELECT * FROM `symbols` WHERE MATCH(`content`) AGAINST(mroonga_escape("+hello_world()", "()") IN BOOLEAN MODE); Here is the example about special characters which is escaped: SELECT mroonga_escape("+-<>~*()\"\\\:"); -- \+\-\<\>\~\*\(\)\"\\\: ParametersRequired parametersThere is one required parameter, string.stringIt specifies text which you want to escape.Optional parametersThere is one optional parameter, special_characters.special_charactersIt specifies characters to escape.The default value is +-<>~*()"\\:. Return valueIt returns escaped string.mroonga_highlight_html()New in version 7.05.Summarymroonga_highlight_html() highlights the specified keywords in target text. It surrounds each keyword with <span class="keyword">...</span> and special characters in HTML such as < and > are escaped. You can use the result as is safely in HTML.Syntaxmroonga_highlight_html() has required parameter and optional parameter:mroonga_highlight_html(text, query AS query) mroonga_highlight_html(text, keyword1, ..., keywordN) AS query is very important. You must specify it to extract keywords from query. UsageHere is a sample to highlight keywords "mroonga" and "groonga" in target text by query "mroonga OR groonga". You must specify AS query:SELECT mroonga_highlight_html('Mroonga is the Groonga based storage engine.', 'mroonga OR groonga' AS query) AS highlighted; Here is the result of the execution example: +--------------------------------------------------------------------------------------------------------+ | highlighted | +--------------------------------------------------------------------------------------------------------+ | <span class="keyword">Mroonga</span> is the <span class="keyword">Groonga</span> based storage engine. | +--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Here is a sample to highlight keywords "mroonga" and "groonga" in target text by keywords "mroonga" and "groonga": SELECT mroonga_highlight_html('Mroonga is the Groonga based storage engine.', 'mroonga', 'groonga') AS highlighted; Here is the result of the execution example: +--------------------------------------------------------------------------------------------------------+ | highlighted | +--------------------------------------------------------------------------------------------------------+ | <span class="keyword">Mroonga</span> is the <span class="keyword">Groonga</span> based storage engine. | +--------------------------------------------------------------------------------------------------------+ ParametersRequired parametersThere is one required parameter.textThe column name of string or string value to be highlighted.Optional parametersThere are some optional parameters.querySpecify query in Groonga's query syntax.You must specify AS query to extract keywords from query like the following: SELECT mroonga_highlight_html('...', 'mroonga OR groonga' AS query); keywordSpecify 0 or more keywords to be highlighted.Return valueIt returns highlighted HTML. If optional parameter is not given, it only escapes special characters in HTML such as <, > in text.mroonga_last_insert_grn_id()Summarymroonga_last_insert_grn_id() UDF provides functionality to get id of last inserted record.This id is unique in Groonga. SyntaxUsageParametersRequired parametersReturn valueIt returns id of last inserted record.mroonga_normalize()New in version 5.11.Summarymroonga_normalize() UDF normalizes text by Groonga's normalizer.Syntaxmroonga_normalize() has required parameter and optional parameter:mroonga_normalize(string) mroonga_normalize(string, normalizer_name) UsageHere is the example query which use Groonga's NormalizerAuto normalizer to be normalized:SELECT mroonga_normalize("ABCDあぃうぇ㍑"); abcdあぃうぇリットル Here is the example query which use Groonga's NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark normalizer to be normalized: SELECT mroonga_normalize("aBcDあぃウェ㍑", "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark"); ABCDあぃうぇ㍑ ParametersRequired parametersThere is one required parameter, string.stringIt specifies text which you want to normalize.Optional parametersThere is one optional parameter, normalizer_name.normalizer_nameIt specifies Groonga's normalizer name to normalize.The default value is NormalizerAuto. Return valueIt returns normalized string.mroonga_snippet()New in version 2.07.Summarymroonga_snippet() UDF provides functionality to get keyword in context.NOTE: This feature is in experimental stage. So, the required
arguments or value is changed without notice in the future.
Syntaxmroonga_snippet() has required parameter and optional parameter:mroonga_snippet(document, max_length, max_count, encoding, skip_leading_spaces, html_escape, snippet_prefix, snippet_suffix, word1, word1_prefix, word1_suffix, ..., [wordN, wordN_prefix, wordN_suffix]) UsageThere is a case that you want to extract keyword and surrounding text as a search results.Snippet means 'keyword and surrounding text'. It is called 'Keyword in context'. mroonga_snippet() function provides the way to get snippet from search results. You can use mroonga_snippet() UDF in storage mode and wrapper mode. mroonga_snippet() function is included in Mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function. mysql> CREATE FUNCTION mroonga_snippet RETURNS STRING SONAME 'ha_mroonga.so'; mroonga_snippet() function is useful for searching the text which contains keyword and associated one by using MATCH .. AGAINST syntax. Imagine searching the document which contains 'fulltext' as a keyword. Assume that some keyword such as 'MySQL' and 'search' are associated with 'fulltext'. mroonga_snippet() function meets above. Here is the schema definition for execution examples(storage mode): CREATE TABLE `snippet_test` ( `id` int(11) NOT NULL, `text` text, PRIMARY KEY (`id`), FULLTEXT KEY `text` (`text`) ) ENGINE=mroonga DEFAULT CHARSET=utf8 Here is the schema definition for execution examples(wrapper mode): CREATE TABLE `snippet_test` ( `id` int(11) NOT NULL, `text` text, PRIMARY KEY (`id`), FULLTEXT KEY `text` (`text`) ) ENGINE=mroonga COMMENT 'engine = "innodb"' DEFAULT CHARSET=utf8 Here is the sample data for execution examples: INSERt INTO snippet_test (id, text) VALUES (1, 'An open-source fulltext search engine and column store.'); INSERT INTO snippet_test (id, text) VALUES (2, 'An open-source storage engine for fast fulltext search with MySQL.'); INSERT INTO snippet_test (id, text) VALUES (3, 'Tritonn is a patched version of MySQL that supports better fulltext search function with Senna.'); Here is the results of execution examples: mysql> SELECT * FROM snippet_test; +----+-------------------------------------------------------------------------------------------------+ | id | text | +----+-------------------------------------------------------------------------------------------------+ | 1 | An open-source fulltext search engine and column store. | | 2 | An open-source storage engine for fast fulltext search with MySQL. | | 3 | Tritonn is a patched version of MySQL that supports better fulltext search function with Senna. | +----+-------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> SELECT id, text, mroonga_snippet(text, 8, 2, 'ascii_general_ci', 1, 1, '...', '...<br>', 'fulltext', '<span class="w1">', '</span>', 'MySQL', '<span class="w2">', '</span>', 'search', '<span calss="w3">', '</span>') FROM snippet_test WHERE MATCH(text) AGAINST ('+fulltext' IN BOOLEAN MODE); +----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | text | mroonga_snippet(text, 8, 2, 'ascii_general_ci', 1, 1, '...', '...<br>', 'fulltext', '<span class="w1">', '</span>', 'MySQL', '<span class="w2">', '</span>', 'search', '<span calss="w3">', '</span>') | +----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | An open-source fulltext search engine and column store. | ...<span class="w1">fulltext</span>...<br>... <span calss="w3">search</span> ...<br> | | 2 | An open-source storage engine for fast fulltext search with MySQL. | ...<span class="w1">fulltext</span>...<br>... <span calss="w3">search</span> ...<br> | | 3 | Tritonn is a patched version of MySQL that supports better fulltext search function with Senna. | ...f <span class="w2">MySQL</span> ...<br>...<span class="w1">fulltext</span>...<br> | +----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) The keyword 'fulltext' and associated keyword 'MySQL' and 'search' has been extracted. ParametersRequired parametersThere are many required parameters.documentThe column name or string value is required.max_lengthThe max length of snippet (bytes) is required.max_countThe max elements of snippets (N word) is required.encodingThe encoding of document is required. You can specify the value of encoding such as 'ascii_general_ci', 'cp932_japanese_ci', 'eucjpms_japanese_ci' and so on.skip_leading_spacesSpecify whether skip leading spaces or not. Specify the value 1 for skipping leading spaces, 0 for not.html_escapeHTML escape is enabled or not. Specify the value 1 for enabling HTML escape, 0 for not.prefixThe start text of snippet.suffixThe end text of snippet.Optional parametersThere is one optional group parameter, wordN related prefix and suffix.wordNSpecify any word.wordN_prefixIt is the start text of wordN.wordN_suffixIt is the end text of wordN.Return valueIt returns snippet string.mroonga_snippet_html()New in version 5.09.Summarymroonga_snippet_html() UDF provides functionality to get highlighted keyword in context.NOTE: This feature is in experimental stage. So, the required
arguments or value is changed without notice in the future.
Syntaxmroonga_snippet_html() has required parameter and optional parameter:mroonga_snippet_html(document, key_word) UsageParametersRequired parametersThere are many required parameters.Optional parametersReturn valueIt returns snippet string.LimitationsThere are some limitations in Mroonga storage engine.Limitations of tableA table has the following limitations. This limitation is derived from Groonga.
Keep in mind that these limitations may vary depending on conditions. Limitations of indexingA full-text index has the following limitations. This limitation is derived from groonga and applied to each table.
Keep in mind that these limitations may vary depending on conditions. Confirm by SHOW TABLE STATUS or SHOW INDEX FROM ... whether your table data matches to this limitations. Limitations about the value of columnsThere is a limitation about the value of column in storage mode.Mroonga storage engine executes automatic conversion against the value NULL. For example, if the value NULL is used in DATE or DATETIME columns, Mroonga storage engine automatically converts 0 into 1 as the value of month or date. Thus, the value 0 is treated as the 1st month (January) of the year or the 1st date of the month. And more, the value NULL is treated as the value of UNIX time 0 (1970-01-01 00:00:00). This kind of automatic conversion is not restricted to only DATE or DATETIME types. The value NULL is converted into the default value of columns. In most cases, it will be converted into empty string for column which belongs to type of string, 0 for column which belongs to type of numeric. Here is an example to show behavior described above. CREATE TABLE date_limitation ( id INT PRIMARY KEY AUTO_INCREMENT, input varchar(32) DEFAULT NULL, date DATE DEFAULT NULL ) ENGINE=mroonga DEFAULT CHARSET=UTF8; CREATE TABLE datetime_limitation ( id INT PRIMARY KEY AUTO_INCREMENT, input varchar(32) DEFAULT NULL, datetime DATETIME DEFAULT NULL ) ENGINE=mroonga DEFAULT CHARSET=UTF8; -- Test data for date_limitation INSERT INTO date_limitation (input) VALUES ("NULL"); INSERT INTO date_limitation (input, date) VALUES ("1970-00-00", "1970-00-00"); -- Test data for datetime_limitation INSERT INTO datetime_limitation (input) VALUES ("NULL"); INSERT INTO datetime_limitation (input, datetime) VALUES ("1970-00-00 00:00:00", "1970-00-00 00:00:00"); Here is the results of execution example: mysql> select * from date_limitation; +----+------------+------------+ | id | input | date | +----+------------+------------+ | 1 | NULL | 1970-01-01 | | 2 | 1970-00-00 | 1970-01-01 | +----+------------+------------+ 2 rows in set (0.00 sec) mysql> select * from datetime_limitation; +----+---------------------+---------------------+ | id | input | datetime | +----+---------------------+---------------------+ | 1 | NULL | 1970-01-01 00:00:00 | | 2 | 1970-00-00 00:00:00 | 1970-01-01 00:00:00 | +----+---------------------+---------------------+ 2 rows in set (0.00 sec) Limitations of column sizeA column has the following limitation.
Full text searchBoolean modeSummaryMroonga can perform boolean full text searches using the IN BOOLEAN MODE modifier for MATCH AGAINST:SELECT ... WHERE MATCH(column) AGAINST ('...' IN BOOLEAN MODE); Normally, IN BOOLEAN MODE is suitable rather than the default IN NATURAL LANGUAGE MODE. Because IN BOOLEAN MODE is similar to query in Web search engine. Most people familiar with query in Web search engine. You can use qualifiers which MySQL support and Mroonga original pragmas in boolean full text search query. These qualifiers and pragmas can change the relative rank of search results. In the case of a search string not using neither a qualifier nor a pragma, the search results that contain the search string will be rated higher. UsageHere are schema and data to show examples:CREATE TABLE books ( `id` INTEGER AUTO_INCREMENT, `title` text, PRIMARY KEY(`id`), FULLTEXT INDEX title_index (title) ) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4; INSERT INTO books (title) VALUES ('Professional MySQL'); INSERT INTO books (title) VALUES ('MySQL for Professional'); INSERT INTO books (title) VALUES ('Mroonga = MySQL + Groonga'); QualifierHere are supported qualifiers.KEYWORD1 KEYWORD2No operator between keywords such as KEYWORD1 KEYWORD2 indicates that one of keywords must be present in each row that is returned.Mroonga for query means that Mroonga or for must be present: SELECT title FROM books WHERE MATCH(title) AGAINST('Mroonga for' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- | MySQL for Professional | -- +---------------------------+ KEYWORD1 OR KEYWORD2OR (must be uppercase) indicates that left hand side keyword or right hand side keyword must be present in each row that is returned.Mroonga OR for query means that Mroonga or for must be present: SELECT title FROM books WHERE MATCH(title) AGAINST('Mroonga OR for' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- | MySQL for Professional | -- +---------------------------+ OR is the default operator. You can omit it. Both Mroonga OR for and Mroonga for return the same result. +KEYWORDA leading plus sign indicates that this word must be present in each row that is returned.+MySQL +Mroonga query means that both MySQL and Mroonga must be present: SELECT title FROM books WHERE MATCH(title) AGAINST('+MySQL +Groonga' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- +---------------------------+ -KEYWORDA leading minus sign indicates that this word must not be present in any of the rows that are returned.+MySQL -Mroonga query means that MySQL must be present but Mroonga must not be present: SELECT title FROM books WHERE MATCH(title) AGAINST('+MySQL -Mroonga' IN BOOLEAN MODE); -- +------------------------+ -- | title | -- +------------------------+ -- | Professional MySQL | -- | MySQL for Professional | -- +------------------------+ PREFIX*A following asterisk indicates that all words starting with this word must be present in any of the rows that are returned.+M* query means that words starting M (MySQL and Mroonga in this case) must be present: SELECT title FROM books WHERE MATCH(title) AGAINST('+M*' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- | Professional MySQL | -- | MySQL for Professional | -- +---------------------------+ NOTE: To be precise, "word" may not be
"word" you think. "word" in this context is
"token". "token" may not be word. For example, tokens in
"It's" are "It", "'" and "s".
You can confirm token by mroonga_command() and tokenize: SELECT mroonga_command('tokenize TokenBigram "It''s" NormalizerMySQLGeneralCI'); -- +--------------------------------------------------------------------------+ -- | mroonga_command('tokenize TokenBigram "It''s" NormalizerMySQLGeneralCI') | -- +--------------------------------------------------------------------------+ -- | [ | -- | { | -- | "value":"IT", | -- | "position":0, | -- | "force_prefix":false | -- | }, | -- | { | -- | "value":"'", | -- | "position":1, | -- | "force_prefix":false | -- | }, | -- | { | -- | "value":"S", | -- | "position":2, | -- | "force_prefix":false | -- | } | -- | ] | -- +--------------------------------------------------------------------------+ JSON value in the above result is formatted by hand. "PHRASE"Quoting phrase by double quote (") indicates that the phrase must be present in any of the rows that are returned.+"Professional MySQL" query means that Professional MySQL phrase must be present. The query doesn't match to MySQL for Profession. MySQL for Profession includes both MySQL and Professional words but doesn't include Professional MySQL phrase: SELECT title FROM books WHERE MATCH(title) AGAINST('+"Professional MySQL"' IN BOOLEAN MODE); -- +--------------------+ -- | title | -- +--------------------+ -- | Professional MySQL | -- +--------------------+ (SUBEXPRESSION...)Parentheses groups expressions.+(Groonga OR Mroonga) +MySQL query means the following:
Here is the result of the query: SELECT title FROM books WHERE MATCH(title) AGAINST('+(Groonga OR Mroonga) +MySQL' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- +---------------------------+ PragmaPragma is metadata for query. You can change how to parse query by specifying pragma.You can embed pragma at the head of query for specifying how to execute. Pragma must exist in the beginning of a query. Don't put a blank into a head of the query. Pragma starts with *: SELECT MATCH AGAINST('*PRAGMA ...' IN BOOLEAN MODE); You can specify multiple pragmas: SELECT MATCH AGAINST('*PRAGMA1PRAGMA2 ...' IN BOOLEAN MODE); Here are available pragmas. D pragmaD pragma indicates the default operator. It's used when an individual operator is omitted.Here is the D pragma syntax. You can choose one of OR, + or - as ${OPERATOR}: *D${OPERATOR} DORDOR means that "or" is used as the default operator.This is the default. Here is an example to use DOR. '*DOR for Mroonga' IN BOOLEAN MODE returns records that includes for or Mroonga: SELECT title FROM books WHERE MATCH (title) AGAINST('*DOR for Mroonga' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | MySQL for Professional | -- | Mroonga = MySQL + Groonga | -- +---------------------------+ D+D+ means that "and" is used as the default operator. It's similar to query in Web search engine.Here is an example to use D+. '*D+ MySQL Mroonga' IN BOOLEAN MODE returns records that includes MySQL and Mroonga: SELECT title FROM books WHERE MATCH (title) AGAINST('*D+ MySQL Mroonga' IN BOOLEAN MODE); -- +---------------------------+ -- | title | -- +---------------------------+ -- | Mroonga = MySQL + Groonga | -- +---------------------------+ D-D- means that "not" is used as the default operator.Here is an example to use D-. '*D- MySQL Mroonga' IN BOOLEAN MODE returns records that includes MySQL but doesn't include Mroonga: SELECT title FROM books WHERE MATCH (title) AGAINST('*D- MySQL Mroonga' IN BOOLEAN MODE); -- +------------------------+ -- | title | -- +------------------------+ -- | Professional MySQL | -- | MySQL for Professional | -- +---------------------------+ W pragmaW pragma indicates target section and its weight for multiple column index.You can specify different weight for each section. The default weight is 1. 1 means that no weight. Here is the W pragma syntax. ${SECTION} is a number that is begun not from 0 but from 1. ${WEIGHT} is omitable: *W[${SECTION1}[:${WEIGHT1}]][,${SECTION2}[:${WEIGHT2}]][,...] Here are schema and data to show examples. You need to create a multiple column index to use W pragma: CREATE TABLE memos ( `id` INTEGER AUTO_INCREMENT, `title` text, `content` text, PRIMARY KEY(`id`), FULLTEXT INDEX text_index (title, content) ) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4; INSERT INTO memos (title, content) VALUES ( 'MySQL', 'MySQL is a RDBMS.' ); INSERT INTO memos (title, content) VALUES ( 'Groonga', 'Groonga is a full text search engine.' ); INSERT INTO memos (title, content) VALUES ( 'Mroonga', 'Mroonga is a storage engine for MySQL based on Groonga.' ); Here is an example to show how to use weight. title column has 10 weight and content columns has 1 weight. It means that keyword in title column is 10 times important than keyword in content column: SELECT title, content, MATCH (title, content) AGAINST('*W1:10,2:1 +Groonga' IN BOOLEAN MODE) AS score FROM memos; -- +---------+--------------------------------------------------------+-------+ -- | title | content | score | -- +---------+--------------------------------------------------------+-------+ -- | MySQL | MySQL is a RDBMS. | 0 | -- | Groonga | Groonga is a full text search engine. | 11 | -- | Mroonga | Mroonga is a storage engine for MySQL based on Groonga | 1 | -- +---------+--------------------------------------------------------+-------+ The score of the first record is 0. Because it doesn't have any Groonga in both title column and content column. The score of the second record is 11. Because it has Groonga in both title column and content column. Groonga in title column has score 10. Groonga in content column has score 1. 11 is sum of them. The score of the third record is 1. Because it has Groonga in only content column. `Groonga in content column has score 1. So the score of the record is 1. S pragmaS pragma indicates syntax of the query.Here is a syntax of S pragma: *S${SYNTAX} Here is a list of available syntax:
*SSYou can use script syntax by *SS pragma. You can use full Groonga search features in script syntax.Here are schema and data to show example of script syntax usage: CREATE TABLE comments ( `content` text, FULLTEXT INDEX content_index (content) ) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4; INSERT INTO comments VALUES ( 'A student started to use Mroonga storage engine. It is very fast!' ); INSERT INTO comments VALUES ( 'Another student also started to use Mroonga storage engine. It is very fast!' ); Here is an example to use near search by script syntax: SELECT content, MATCH (content) AGAINST('*SS content *N "student fast"' IN BOOLEAN MODE) AS score FROM comments; -- +------------------------------------------------------------------------------+-------+ -- | content | score | -- +------------------------------------------------------------------------------+-------+ -- | A student started to use Mroonga storage engine. It is very fast! | 1 | -- | Another student also started to use Mroonga storage engine. It is very fast! | 0 | -- +------------------------------------------------------------------------------+-------+ Near search matches only when there are 10 or less words between specified words (student and fast in this case). So student started ...(8 words)... very fast is matched but student also started ...(8 words)... very fast isn't matched. You can also use other advanced features. Search and Scoring in MroongaMroonga Scoring in natural language modeThe search score of Mroonga in natural language mode is a similarity score between query and document. Mroonga's scoring algorithm is as follows:
We are showing by example. The first thing we create a table and insert data as follows: SET NAMES UTF8; CREATE TABLE diaries ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, content TEXT, FULLTEXT INDEX(content) ) ENGINE mroonga DEFAULT CHARSET UTF8; INSERT INTO diaries (content) VALUES("It'll be fine tomorrow as well."); INSERT INTO diaries (content) VALUES("It'll rain tomorrow."); INSERT INTO diaries (content) VALUES("It's fine today. It'll be fine tomorrow as well."); INSERT INTO diaries (content) VALUES("It's fine today. But it'll rain tomorrow."); We use a query "fine today" to search. The search result is as follows: mysql> SELECT *, MATCH (content) AGAINST ("fine today") AS score -> FROM diaries -> WHERE MATCH (content) AGAINST ("fine today") -> ORDER BY MATCH (content) AGAINST ("fine today") DESC; +----+--------------------------------------------------+--------+ | id | content | score | +----+--------------------------------------------------+--------+ | 3 | It's fine today. It'll be fine tomorrow as well. | 131073 | | 4 | It's fine today. But it'll rain tomorrow. | 131073 | +----+--------------------------------------------------+--------+ 2 rows in set (0.01 sec) Now, let us explain how to make the result score "131073". Splitting a query into tokensThe default tokenizer splits up the query "fine today" into two tokens as follows:
Removing not matched tokensWe have nothing to do in this case because the tokens exist in documents.
Calculating weight per token
1048576(= 2 ** 20) denotes the number of tokens in all documents. We should get the number from the table, but as a matter of fact, the fixed value is used for simplicity. In the fraction 1048576 / 8, the denominator "8" indicates that the number of document which contains token "today". In the fraction 1048576 / 9, the denominator "9" means the number of document which contains token "fine". The denominator "8" and "9" are approximate number, strictly speaking, "2" is the correct answer for the number of document that contains "today", "3" is the correct answer for the number of document that contains "fine". You can check the approximate number of token by using the following query. SELECT mroonga_command("select diaries-content --query '_key:fine OR _key:today' --output_columns _key, index --limit -1") AS groonga_response; The retrieval result of the above query is as follows: [[[2],[["_key","ShortText"],["index","diaries"]],["FINE",9],["TODAY",8]]] Getting the top N weight tokensThe formula for the N is "the number of occurrence of token in document / 8 + 1"In this case, N = 2 / 8 + 1 ≒ 1 The sorted tokens are as follows:
N = 1, then we get "today: 131072(= 1048576 / 8)". Summing up the weight per token which exists in a document, and is one of the top N weight tokens
Finally document id 3,4 are hit, the similarity score between query and document(id=3) is 131072 + 1 = 131073 ("1" is the number of occurrence of token "today" in document). The similarity score between query and document(id=4) is the same as the score between query and document(id=3). FAQThere are some frequently asked questions for Mroonga. This section describes about them.How to avoid mmap Cannot allocate memory error?Short answerIncrease vm.max_map_count Linux kernel parameter value.Long answerThere is a case following mmap error in log file:2013-06-04 08:19:34.835218|A|4e86e700|mmap(4194304,551,432017408)=Cannot allocate memory <13036498944> Note that <13036498944> means total size of mmap (almost 12GB) in this case. So you need to confirm following point of views.
To check there are enough free memory, you can use vmstat command. To check whether maximum number of mappings are exceeded, you can investigate the value of vm.max_map_count. If this issue is fixed by modifying the value of vm.max_map_count, it's exactly the reason. As Groonga allocates memory chunks each 256KB, you can estimate the size of database you can handle by following formula: (database size) = vm.max_map_count * (memory chunks) If you want to handle over 16GB Groonga database, you must specify at least 65536 as the value of vm.max_map_count: database size (16GB) = vm.max_map_count (65536) * memory chunks (256KB) You can modify vm.max_map_count temporary by sudo sysctl -w vm.max_map_count=65536. Then save the configuration value to /etc/sysctl.conf or /etc/sysctl.d/*.conf. COMMUNITYThere are some places for sharing Groonga and Mroonga information. We welcome you to join our community.Mailing ListThere are mailing lists for discussion about Groonga and Mroonga.
Chat roomThere are chat rooms for discussion about Groonga and Mroonga.
Please follow the account to get the latest Groonga and Mroonga related information! Please like the page to get the latest Mroonga related information! HOW TO CONTRIBUTE TO MROONGAWe welcome your contributions to the Mroonga Project. There are many ways to contribute, such as using Mroonga, introduction to others, etc. For example, if you find a bug when using Mroonga, you are welcome to report the bug. Coding and documentation are also welcome for Mroonga and its related projects.
How to report a bugThere are two ways to report a bug:
You can use either way It makes no difference to us. Submit a bug to the issue trackerMroonga project uses GitHub issue tracker.You can use English or Japanese to report a bug. Report a bug to the mailing listMroonga project has Community for discussing about Mroonga. Please send an Email that describes a bug.How to collect data for reporting a reproducible bugThere may be a case that Mroonga doesn't work as expected. It may be a crash bug or SQL returns unexpected search results. To solve such a issues with you, We need a reproducible bug report.For a reproducible bug report, it is better to report with the following information as much as you can because it is useful to investigate the reason which causes the bug.
There is a case that it is not enough to reproduce a bug with above information. For such a case, additional information is required.
How to contribute in documentation topicsWe use Sphinx for documentation tool.具体的にどんなことをやればいいのかを説明します。基本的にはドキュメントのソースファイルごとにpull requestを送ってもらうと進めやすいです。 対象となるファイルはdoc/sourceディレクトリ以下の拡張子が「.rst」となっているファイルです。 あまりGitHubでの作業に慣れていなくてもできるように、「最初にやること」と「作業ごとにやること」、「ファイルごとにやること」に分けて順に説明します。
The things you must do at first以下では、最初に一度だけ実施しておけば良いことを説明します。Required softwaresTODOGit configurationまずは、gitの設定をしましょう。すでにある程度gitを使っている場合には初期設定はすでに完了しているかも知れません。その場合には飛ばして構いません。:% git config --global user.name "Your Name" % git config --global user.email "Email address" 上記はコミットログに使われます。公開しても差し支えないユーザ名もしくはメールアドレスを設定します。 Fork on GitHubFirst, create GitHub account. If your GitHub account is ready, login to GitHub and access following URL.
Fork リポジトリ選択画面でご自分のリポジトリへとforkしてください。 Initial configuration for working repositoryClone Mroonga repository to working directory. Don't forget to do "Git configuration".:% git clone git@github.com:(YOUR_GITHUB_ACCOUNT)/mroonga.git % cd mroonga % git remote add upstream git@github.com:mroonga/mroonga Initial configuration for building documentationExecute following commands to prepare for generating Mroonga documentation:% ./autogen.sh % ./configure --enable-document --with-mysql-source=(SOURCE_DIRECTORY_OF_MySQL) Next step is "The things you need to do every tasks". The things you need to do every tasks以下では作業ごとにやることを説明します。Follow the upstreamMroonga本家の最新状態に追従して、作業がかぶらないようにします。:% git fetch --all % git checkout master % git rebase upstream/master 最新の状態に追従できたら、「ファイルごとにやること」へと進みます。 The things you need to do every files以下では、例えば http://mroonga.org/docs/characteristic.html を更新する場合で説明します。作業対象となるファイルは、リポジトリのdoc/source/ディレクトリ以下にあり拡張子が.rstなファイルです。今回は、doc/source/characteristic.rstを変更する例で説明します。Create working branchCreate a working branch. Use meaningful branch name.% git checkout -b
use-capitalized-notation-characteristic
Editing textFix typos, styles or write a new document for Mroonga.Confirm generated documentマークアップに問題がないか、HTMLを確認します。HTMLを生成するには以下のコマンドを実行します。:% cd doc/locale/en % make html いつも使っているブラウザで該当ファイルを確認して、変更した内容が反映されていればOKです。: % firefox html/characteristic.html CommitHTMLに問題がないことを確認できたら、コミットします。:% cd ${cloneしたディレクトリーのトップディレクトリー} % git add doc/source/characteristic.rst % git commit コミットするときのメッセージについては、例えば以下のようにします。: doc: use "Mroonga" notation Push and pull requestPublish your changes to your own GitHub repository:% git push -u origin use-capitalized-notation-characteristic Note that use-capitalized-notation-characteristic is already created branch in advance. ブラウザで https://github.com/(GitHubのアカウント)/mroonga を開くと「 @use-capitalized-notation-characteristic@ 」ブランチをpull requestする!みたいなUIができているので、そこのボタンを押してpull requestしてください。入力フォームがでてきますが、コミットしたときメッセージで十分なのでそのままpull requestしてOKです! これで、ひととおりの作業は完了しました。 DEVELOPER'S GUIDEHow to debugBuilding for debuggingWhen you build software for debugging, you can get more information like symbol resolutions in gdb. So we build both MySQL and Mroonga for debugging in development.NOTE: If you build one of them for debugging, the size of
structures etc. might be different, and you might not be able to load Mroonga,
or assertions don't work in running.
How to build MySQL for debuggingAs you can see in MySQL :: MySQL 5.5 Reference Manual :: 2.9.2 Installing MySQL from a Standard Source Distribution, you can build MySQL for debugging by passing -DWITH_DEBUG=yes option in CMAKE options.The procedure from download to build is the following. % mkdir -p ~/work/ % cd ~/work/ % wget http://ftp.jaist.ac.jp/pub/mysql/Downloads/MySQL-5.5/mysql-5.5.13.tar.gz % tar xvzf mysql-5.5.13.tar.gz % cd mysql-5.5.13 % cmake . -DCMAKE_INSTALL_PREFIX=/tmp/local -DWITH_DEBUG=yes % make How tom build Mroonga for debuggingYou can build Mroonga for debugging by passing --with-debug in configure options.The procedure from cloning repository to build is the following. % cd ~/work/ % git clone git@github.com:mroonga/mroonga.git % cd mroonga % ./autogen.sh % ./configure CFLAGS="-g3 -O0" CXXFLAGS="-g3 -O0" --with-debug --prefix=/tmp/local --with-mysql-source=$HOME/work/mysql-5.5.13 --with-mysql-config=$HOME/work/mysql-5.5.13/scripts/mysql_config % make When you successfully build both, please invoke tests like the following. If you get [pass] for all tests, you succeeded to build for debugging. % test/run-sql-test.sh More about run-sql-test.shrun-sql-test.sh is our friend for debugging. Here we show some examples of its usage.Run the specified test onlyWhen you invoke run-sql-test.sh without any option, all tests under mysql-test/mroonga will be invoked.So if you want to run certain tests only, you can specify the test name in --do-test option. ./test/run-sql-test.sh --do-test=foobar See the traceWhen you run tests by adding --debug option like the following, function calls information is recorded../test/run-sql-test.sh --debug --do-test=foobar This information is stored in ${MySQL's working directory}/${MySQL version}/mysql-test/var/log/mysqld.1.trace. When you add a new function, it would be a good idea to put it in the beginning of MRN_DBUG_ENTER_FUNCTION function and record its calls. Invoking GDBBy adding --gdb option, you can debug with GDB when you run tests../test/run-sql-test.sh --gdb Release procedureRequirementsHere is the requirements about release procedure.
Use the following working directories.
Setup build environmentInstall the following packages:% sudo apt-get install -V ruby mecab libmecab-dev gnupg2 dh-autoreconf python-sphinx bison Describe the changesSummarize recent changes since the previous release into doc/source/news.txt. This summary is also used in release note.Execute the following command to collect change logs since the previous version: % git log -p --reverse $(git tag | tail -1).. Search the logs by ^commit, and pick up collect remarkable changes. Should be included
Shoud not be included
Generate configure scriptBecause of master branch doesn't include configure script, then it is required to generate configure script for building.Execute autogen.sh with the following command: % sh autogen.sh It generates configure script. Execute configure scriptFor generating Makefile, execute the configure script.Execute with the following options for the release: % ./configure \ --enable-document \ --prefix=/tmp/local \ --with-launchpad-uploader-pgp-key=(The key ID that is registered Launchpad) \ --with-mroonga-github-com-path=$MROONGA_GITHUB_COM_PATH \ --with-cutter-source-path=$CUTTER_SOURCE_PATH \ --with-groonga-source-path=$GROONGA_SOURCE_PATH \ --with-mysql-source=(The directory of MySQL source code) \ --with-mysql-build=(The build directory of MySQL) \ --with-mysql-config=(The path to mysql_config command) See Others for the details of --with-mysql-source option. NOTE: We can not upload and override the same name packages to
PPA repository. Therefore, we upload packages to a stable repository after
confirming the successful build of they in the nightly repository.
Check whether you can upload packagesCheck whether you can login to packages.groonga.org as packages user.You can check with the following command whether you can login: % ssh packages@packages.groonga.org If you can't login to packages.groonga.org, you must be registered ssh public key. Execute make update-latest-releaseExecute make update-latest-release command with OLD_RELEASE_DATE, NEW_RELEASE_DATE.When 9.09 release, we executed the following command: % make update-latest-release OLD_RELEASE=9.09 OLD_RELEASE_DATE=2019-09-27 NEW_RELEASE_DATE=2019-10-30 This command updates some html files (which is used for web sites of Mroonga - index.html,ja/index.html) and the version of spec file or debian/changelog entry. Confirm the results of each testWe confirm the results of all the below tests before setting the tag to Mroonga. Because if we will find problems in Mroonga after setting the tag to it, we must release it again.
Tagging for releaseExecute the following command for tagging:% make tag % git push --tags origin NOTE: After tagging for the release, execute configure
script. This tag information is reflected when generating the documents.
Upload archive filesThen, create archive file (tar.gz) for distribution:% make dist Change working directory to packages/source: % cd packages/source Execute make download for syncing with the upstream: % make download Execute make archive for generating source archive: % make archive Execute make upload for uploading archive file: % make upload As a result, tar.gz archive file is available from https://packages.groonga.org/source/mroonga/. Create packages for the releaseCreate Linux and Windows packages.DebianChange working directory to packages:% cd packages Execute the following command: % rake apt Now we finish build and upload packages to https://packages.groonga.org/. However, these packages are unsigned. We sign packages by executing the below commands: % cd $PACKAGES_GROONGA_ORG_REPOSITORY % rake apt Debian derivatives(Ubuntu)For Ubuntu, packages are provided by PPA on launchpad.net.Change working directory to packages and execute rake ubuntu:upload command: % cd packages % rake ubuntu When upload packages was succeeded, package build process is executed on launchpad.net. Then build result is notified via E-mail. You can install packages via Groonga PPA on launchpad.net: https://launchpad.net/~groonga/+archive/ubuntu/ppa Red Hat derivativesChange working directory to packages% cd packages Execute the following command: % rake yum Now we finish build and upload packages to https://packages.groonga.org/. However, these packages are unsigned. We sign packages by executing the below commands: % cd $PACKAGES_GROONGA_ORG_REPOSITORY % rake yum WindowsFor windows packages, we use artifacts of GitHub release page .Upload documents
% make update-document DOCUMENT_VERSION=xx.xx_VERSION DOCUMENT_VERSION_FULL=xx.xx
Update blog(Mroonga blog)We update the below files.
We can confirm contents of blog on Web browser by using Jekyll.: % jekyll serve --watch We access http://localhost:4000 on our web browser for confirming contents. NOTE: If we want private to blog contents, we set false
on published: in .md file.:
--- layout: post.en title: Mroonga 10.01 has been released! description: Mroonga 10.01 has been released! published: false --- Announce release for mailing listSend release announce for each mailing list
Announce release for twitterClick Tweet link in Mrooga blog entry. You can share tweet about latest release. If you use tweet link, title of release announce and URL is embedded into your tweet.Execute sharing tweet in Japanese and English version of blog entry. Note that this tweet should be done when logged in by groonga account. Announce release for FacebookWe announce release from Mroonga group in Facebook.https://www.facebook.com/mroonga/ Bump versionBump version to the latest release:% make update-version NEW_VERSION_MAJOR=9 NEW_VERSION_MINOR=1 NEW_VERSION_MICRO=0 TODO: Translate. コーディングスタイル一般的に1つのコードベースに複数のスタイルがまざっているとソースコードが読みづらくなります。たとえ、それぞれのスタイル単独では読みやすいスタイルあっても、まざると読みづらくなります。そのため、Mroongaプロジェクトでもスタイルを統一して、読みやすいソースコードになるようにします。読みやすいソースコードにする理由は以下の通りです。
どちらの場合も周辺のソースコードを読んで、それをベースにコードを追加・変更します。このとき、ソースコードが読みやすい状態だと周辺のソースコードの把握をスムーズに行うことができ、スムーズにその後の作業に移れます。 TODO: 読みやすさの他にデバッグのしやすさ(gdbでの追いやすさ)も考慮に入れたほうがよさそうだがどうしよう。 言語基本的にすべてC++で記述します。よほどのことがない限りCは使いません。よい例: ha_mroonga.cpp 悪い例(C言語を使っている): mrn_sys.c ファイル名ソースコードのファイル名は全て小文字にします。また、単語ごとに"_"で区切ります。よい例: ha_mroonga.cpp 悪い例(大文字を使っている): HA_MROONGA.cpp 悪い例(単語を"_"で区切らずにくっつけている): hamroonga.cpp 悪い例(単語を"-"で区切っている): ha-mroonga.cpp ソースコードの拡張子 .cpp にします。 よい例: ha_mroonga.cpp 悪い例( .cc を使っている): ha_mroonga.cc ヘッダーファイルの拡張子は .hpp にします。 よい例: ha_mroonga.hpp 悪い例( .h を使っている): ha_mroonga.h
名前空間ヘッダーファイルでは using namespace を使わない。ソースコードでは using namespace std であれば使ってもよい。他の名前空間は使ってはいけない。よい例: ha_mroonga.cpp: using namespace std; 悪い例(ヘッダーファイル内で using namespace を使っている): ha_mroonga.hpp: using namespace std; 悪い例( std 以外の名前空間に対して using namespace を使っている): ha_mroonga.cpp: using namespace zmq; include ガード2重 include を防ぐためのマクロは、ヘッダーファイルの名前をすべて大文字にし、単語の区切りをアンダースコアにしたものにし、最後にアンダースコアをつけて HEADER_FILE_NAME_HPP_ という名前にする。よい例: mrn_db_path.hpp: #ifndef MRN_DB_PATH_HPP_ #define MRN_DB_PATH_HPP_ ... #endif // MRN_DB_PATH_HPP_ 悪い例(小文字になっている): mrn_db_path.hpp: #ifndef mrn_db_path_hpp_ #define mrn_db_path_hpp_ ... #endif // mrn_db_path_hpp_ 悪い例(アンダースコアが最後ではなく先頭についている): mrn_db_path.hpp: #ifndef _MRN_DB_PATH_HPP #define _MRN_DB_PATH_HPP ... #endif // _MRN_DB_PATH_HPP 代入= の前後に1つスペースを入れる。よい例: int i = 0; 悪い例(スペースが入っていない): for (i=0; i<10; ++i) {...} 悪い例(2つスペースが入っている): int i = 0; 文字列文字列はポインタと長さで表現する。 \0 での終端を仮定しない。よい例(本当はもっとすっきりした例がよいけど。。。): char *raw_data = "table_name column_name column_value" char *column_name; size_t column_name_size; column_name = raw_data + strlen("table_name "); column_name_size = strlen("column_name"); 悪い例(無理やり \0 終端にしている): char *raw_data = "table_name column_name column_value" char *column_name; column_name = strndup(raw_data + strlen("table_name "), strlen("column_name")); ただし、ファイル名など \0 が前提であるものに関しては \0 終端を仮定してよい。 よい例: char *database_path = "db/test.mrn"; 悪い例( \0 終端を仮定せず、長さも管理している): char *database_path = "db/test.mrn"; size_t database_path_size = strlen("db/test.mrn"); std::string は内部でメモリ確保などの処理が発生するので多用しない。 よい例: char database_path[MAX_PATH]; 悪い例(最大サイズがわかっているのに std::string を使っている): std::string database_path; バッファTODO: ちゃんと考える。何度も繰り返し使う領域(バッファ)が必要な場合は GRN_BULK を使う。例えば、カラムの値を取得する領域などである。 命名規則クラス名クラスの名前は UpperCamelCase とする。よい例: class MyClass { } 悪い例( snail_case である): class my_class { } ただし、 ha_mroonga などMySQLとのインターフェイスとなるクラスでかつ他の類似のモジュールに命名規則がある場合はそれに従う。 よい例: class ha_mroonga: public handler { } 悪い例( UpperCamelCase になっている): class HaMroonga: public handler { } メンバー変数名メンバー変数名は snail_case とし、末尾にアンダースコア( _ )を付ける。よい例: class MyClass { char *my_name_; } 悪い例( UpperCamelCase である): class MyClass { char *MyName_; } 悪い例(末尾にアンダースコアがない): class MyClass { char *my_name; } 読み込み用アクセサ名メンバー変数の値を読み込むメソッドの名前はメンバー変数名の末尾のアンダースコアを除いたものにする。よい例: class MyClass { char *my_name_; const char *my_name() {return my_name_;}; } 悪い例(末尾にアンダースコアが残っている): class MyClass { char *my_name_; const char *my_name_() {return my_name_;}; } 悪い例(先頭に get_ を付けている): class MyClass { char *my_name_; const char *_my_name() {return my_name_;}; } 書き込み用アクセサ名メンバー変数の値を設定するメソッドの名前は、メンバー変数名の末尾のアンダースコアを除き、先頭に set_ を加えたものにする。よい例: class MyClass { unsigned int age_; void set_age(unsigned int age) { age_ = age; }; } 悪い例(末尾にアンダースコアが残っている): class MyClass { unsigned int age_; void set_age_(unsigned int age) { age_ = age; }; } 悪い例(先頭に set_ ではなく update_ を付けている): class MyClass { unsigned int age_; void update_age(unsigned int age) { age_ = age; }; } コピーコンストラクター基本的にコピーコンストラクターの使用を禁止する。よほどのことがなければ使用しないこと。コピーコンストラクターは暗黙的に無駄なコピーが発生する可能性があるためパフォーマンス上の問題がある。コピーではなくポインターやリファレンスを用いること。 また、デフォルトのコピーコンストラクター実装はメンバー変数のポインターの値をそのままコピーするため、デコンストラクターで二重に解放してしまう危険性がある。そのため、明示的にコピーコンストラクターを定義しない場合は無効にする。 よい例: class MyClass { private: MyClass(const MyClass &); } 悪い例(コピーコンストラクターを禁止していない): class MyClass { } 悪い例(カスタムコピーコンストラクターを使っている): class MyClass { unsigned int age_; MyClass(const MyClass &object) { age_ = object.age_; } } クラスの代入基本的に定義したクラスの代入を禁止する。よほどのことがなければ使用しないこと。代入演算子は暗黙的に無駄なコピーが発生する可能性があるためパフォーマンス上の問題がある。コピーではなくポインターやリファレンスを用いること。 また、デフォルトの代入演算子の実装はメンバー変数のポインターの値をそのままコピーするため、デコンストラクターで二重に解放してしまう危険性がある。そのため、明示的に代入演算子を定義しない場合は無効にする。 よい例: class MyClass { private: MyClass &operator=(const MyClass &); } 悪い例(代入を禁止していない): class MyClass { } 悪い例(代入を使っている): class MyClass { unsigned int age_; MyClass &operator=(const MyClass &object) { age_ = object.age_; return *this; } } 引数voidを省略引数がない場合は void を省略する。よい例: class MyClass { unsigned int age_; unsigned int age() { return age_; }; } 悪い例( void を省略していない): class MyClass { unsigned int age_; unsigned int age(void) { return age_; }; } 入力用引数にはconstを付ける入力のみに用いる引数には const を付ける。これは、入力のみに用いる引数である事を明示するためと、間違って引数を変更してしまわないためである。よい例: class Table { void insert(unsigned int id, const char *column_name, const char *value) { Record *record = records[i]; Column *column = columns[column_name]; column.set_value(value); } } 悪い例(入力のみに用いているのに const が付いていない): class Table { void insert(unsigned int id, char *column_name, char *value) { Record *record = records[i]; Column *column = columns[column_name]; column.set_value(value); } } 定数フラグやサイズなどを示す定数には const オブジェクトを用いる。これはデバッガー上でプログラムを走らせているときに名前で値を参照できるようにするためである。よい例: const char *MRN_LOG_FILE_PATH = "groonga.log"; 悪い例( #define を用いている): #define MRN_LOG_FILE_PATH "groonga.log" 真偽値bool型を用いる真偽値には bool 型を用いる。よい例: bool is_searching; 悪い例( int 型を用いている): int is_searching; 真偽値のリテラルには true または false を用いる真偽値の値として true と false はより完結で説明的だからである。よい例: bool is_searching = true; 悪い例( 0 以外の値を真の値として用いている): bool is_searching = 1; 条件式真偽値は比較しない真偽値の値は boolean_value == true などとせず、 boolean_value として条件式に使用する。すでに真偽値の値を真偽値のリテラルと比較することは重複したコードだからである。よい例: boolean is_searching = true; if (!is_searching) { ... } 悪い例(真偽値のリテラルと比較している): boolean is_searching = true; if (is_searching == false) { ... } NULL と比較しないNULL かどうかを条件式に使う場合は value == NULL ではなく !value というように書く。多くの言語で NULL に相当する値(たとえばLispの nil )は偽を表すため、明示的に NULL と比較しなくても意図は伝わるからである。よい例: char *name = NULL; if (!name) { ... } 悪い例( NULL と比較している): char *name = NULL; if (name == NULL) { ... } 数値は比較するCやC++では 0 は偽、 0 以外は真の値となるが、条件式に数値を使う場合は strcmp(...) == 0 などというように明示的に比較する。C++では真偽値に bool を使うためこのような状況は発生しないが、C言語由来のAPIでは int で真偽値を表現している場合が多い。しかし、 int だけでは真偽値として使っているか本当に数値として使っているかがわかりにくいため、 int のときはすべて数値として扱う。 よい例: if (memcmp(value1, value2, value_size) == 0) { printf("same value!\n"); } 悪い例( 0 を偽の値として扱っている): if (!memcmp(value1, value2, value_size)) { printf("same value!\n"); } 初期化メンバー変数の初期化には初期化リストを用いる無駄な処理を省くためにコンストラクターでのメンバー変数の初期化には初期化リストを用いる。初期化リストを用いないとコンストラクターの処理とコピーコンストラクター・代入処理が行われたりなど非効率である。(後述)よい例: class Table { Table(const char *name); std::string name_; } Table::Table(const char *name) : name_(name) { } 悪い例( std::string(name) のところでコンストラクターが動き、 name_ = ... のところで代入演算子が動いて2回初期化している): class Table { Table(const char *name); std::string name_; } Table::Table(const char *name) { name_ = std::string(name); } 変数宣言と同時に初期化する変数を宣言したときに同時に初期化する。宣言時に初期化せずに代入して初期化すると、無駄な処理が発生する可能性があるため非効率である。(後述)よい例: std::string name("users"); 悪い例( std::string() のところでコンストラクターが動き、 name = ... のところで代入演算子が動いて2回初期化している): std::string name; name = std::string("users"); インクリメント・デクリメント前置形式を用いる後置形式ではオブジェクトのコピーをしなければいけないため非効率である。そのため、できるだけ前置形式を用いる。よい例( int だと効率は変わらないので本当はあんまりよい例ではない): for (int i = 0; i < 10; ++i) { } 悪い例(後置形式を用いている): for (int i = 0; i < 10; ++i) { } キャストC++のスタイルを用いるCスタイルのキャストはなんでもキャストできてしまうため、意図しないキャストにも気付かない可能性がある。例えば、単に const を外したいだけなのに、間違って違う型に変換していても気付けない。C++のキャストでは const を外したいときは const_cast を使用し、型を変換するときは static_cast を指定する。こうすれば、 static_cast で間違って const を外してしまっている場合も気付ける。 reinterpret_cast はどうしても必要なときのみ注意して使う。よい例( const_cast を使っている): uchar *to_key; const ucahr *from_key; KEY *key_info; uint key_length; key_copy(to_key, const_cast<uchar *>from_key, key_info, key_length); よい例( static_cast を使っている): int n_hits = 1; int n_documents = 10; float hit_ratio = (float)(n_hits) / n_documents; よい例( static_cast では無理なので reinterpret_cast を使っている): THD *thread = current_thd; my_hash_delete(&mrn_allocated_thds, reinterpret_cast<uchar *>(thread)); 悪い例(Cスタイルのキャストを使っている): int n_hits = 1; int n_documents = 10; float hit_ratio = (float)(n_hits) / n_documents; 悪い例( static_cast で十分なのに reinterpret_cast を使っている): void *value = get_value(key); char *name; name = reinterpret_cast<char *>(value); 変数宣言ポインタ型を示す * とリファレンス型を示す & は変数名に寄せるCと同様にポインタ型を示す * は型名ではなく変数名に寄せる。これは、以下のように複数の変数を一度に宣言したときに一貫性がなくなるためである。2つめ以降の変数は近くに型名がないため * を寄せる場所がない。例: char* key, *value; 同様に、リファレンス型を示す & も変数名に寄せる。 なお、 * や & と型名の間にはスペースを入れない。 よい例: char *key; よい例: bool is_exist(const std::string &file_name); 悪い例(型名に寄せている): char* key; その他
以下、具体例が必要。
メモ
How we are developingThe Mroonga project uses Redmine for Ticket Driven Development.The location of our Redmine site is the following. http://redmine.groonga.org/projects/mroonga Developments are done per ticket. We would like to ask developers to register on the site above. RoadmapWe develop based on the roadmap, and you can see our roadmap in the following page.http://redmine.groonga.org/projects/mroonga/roadmap We draw up our roadmap by discussing in off-line or on-line meetings. The roadmap specifies the list of adding features in each version. Source code managementWe manage the source code using Git on Github.http://github.com/mroonga/mroonga If you want to contribute, please make clone in read/write mode, and then commit and push. If you still do not have your account on github.com, please register. Development processWe develop Mroonga using the following procedures.
We welcome your ideas about new features or changes of specifications. Please create a ticket first and describe your idea there. For adding and running test codes, please refer the description below. We want to ask the assignee to handle whole the process, like design, implementation, test and documentations. Then you ask other developers to review (especially in case of adding new features), and when it passes you add or update documents and the ticket can be closed. For adding and updating documents, please refer the description below. Development environmentCurrently we are assuming the following development environments.
Contents of the source treeThere are just a few source files for now, and we would like to keep it simple as possible.
Since we are still in the early stage of the development of Mroonga, we will not make the documents of the specification of SQL queries for now. Alternatively we consider our SQL tests and its expected results as the list of features and the definition of their specifications. Adding and running testsWe use two kinds of regression tests to manage the quality of Mroonga.
Before pushing to the repository, please always run the regression tests and confirm that you don't introduce any degradation. You can invoke these two kinds of tests by "make check". SQL tests are implemented as "sub test suite" for "mysql-test" in MySQL's regression tests. For the detail about how to add test cases or how to modify expected result files, please refer the following MySQL document. http://dev.mysql.com/doc/mysqltest/2.0/en/index.html For the detail about C/C++ unit tests, please refer the following Cutter document. http://cutter.sourceforge.net/ Adding and updating documentsWe use Sphinx for the documentation of Mroonga.Sphinx is a documentation generator which uses reStructuredText as its markup language, and converts reStructuredText files into HTML files. reStructuredText uses .rst filename extensions. We write documents in the reStructuredText format and convert them to HTML. The source files of documents have .rst extension in the "doc/source" directory of the Mroonga repository. When you add or update them, please try "make html" to confirm that there are no syntax errors. Documents are published in http://mroonga.org. Since we are using GitHub Project Pages, the web site is updated when HTML files are pushed to the http://github.com/mroonga/mroonga.github.com repository. We push added or updated HTML files to the repository after confirming the consistency between the documents and the current release version. So you can just push to Mroonga repository to push documents for each ticket. The details are as follows. Installing SphinxYou don't need to install Sphinx by yourself because Mroonga clones the latest Sphinx from Sphinx repository automatically. You just need to install Mercurial.Build MroongaIn order to enables documentation generation, you should run configure script with "--enable-document" option as follows.% ./configure --enable-document --with-mysql-source=(your mysql source directory) % make % make install Adding a documentWrite a document in reStructuredText format, and save the document with .rst extension in "doc/source" directory.See reStructuredText (reST) concepts and syntax about how to write a document in reStructuredText format. Confirm generated documentYou can generate HTML files by the following command.% cd doc/locale/en % make html You can confirm generated HTML document in web browser, after generating HTML files. Here we show an example of using Firefox. % firefox html/characteristic.html
AUTHORMroonga ProjectCOPYRIGHT2009-2022, Mroonga Project
Visit the GSP FreeBSD Man Page Interface. |