- 全文索引使用服务器字符集和排序规则(character_set_server 和 collation_server 系统变量的值)加载停止词列表并搜索全文查询。如果用于全文索引或搜索的停止词文件或列具有不同于 character_set_server 或 collation_server 的字符集或排序规则,则停止词查找可能会出现错误命中或未命中。
- 停止词查找的大小写敏感性取决于服务器排序规则。例如,如果排序规则为 utf8mb4_0900_ai_ci,则查找不区分大小写,而如果排序规则为 utf8mb4_0900_as_cs 或 utf8mb4_bin,则查找区分大小写。
InnoDB 搜索索引的停止词
- InnoDB 有一个相对较短的默认停止词列表,因为来自技术、文学和其他来源的文档通常使用短词作为关键字或重要短语。例如,您可能会搜索“to be or not to be”并希望得到一个合理的结果,而不是忽略所有这些词。
- 要查看默认的 InnoDB 停止词列表,请查询 Information Schema INNODB_FT_DEFAULT_STOPWORD 表
示例
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
运行结果如下:
+---------+
| value |
+---------+
| a |
| about |
| an |
| are |
| as |
| at |
| be |
| by |
| com |
| de |
| en |
| for |
| from |
| how |
| i |
| in |
| is |
| it |
| la |
| of |
| on |
| or |
| that |
| the |
| this |
| to |
| was |
| what |
| when |
| where |
| who |
| will |
| with |
| und |
| the |
| www |
+---------+
36 rows in set (0.00 sec)
InnoDB 自定义停止词
- 要为所有 InnoDB 表定义自己的停止词列表,请定义一个与 INNODB_FT_DEFAULT_STOPWORD 表具有相同结构的表,用停止词填充它,并在创建完整表之前将 innodb_ft_server_stopword_table 选项的值设置为 db_name/table_name 形式的值文本索引。停止词表必须有一个名为 value 的 VARCHAR 列。
- 以下示例演示了为 InnoDB 创建和配置新的全局停止词表。
示例
-- Create a new stopword table
CREATE TABLE my_stopwords(
value VARCHAR(30)
) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)
-- Insert stopwords (for simplicity, a single stopword is used in this example)
INSERT INTO my_stopwords(value)
VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)
-- Create the table
CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)
-- Insert data into the table
INSERT INTO opening_lines(opening_line,author,title)
VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
-- Set the innodb_ft_server_stopword_table option to the new stopword tabl
SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)
-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
CREATE FULLTEXT INDEX idx
ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17
sec)
Records: 0
Duplicates: 0 Warnings: 1
- 要逐表创建停止词列表,请创建其他停止词表并使用 innodb_ft_user_stopword_table 选项指定要在创建全文索引之前使用的停止词表。
示例
SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)
SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE
LIMIT 15;
运行结果如下:
+------------+
| word |
+------------+
| across |
| all |
| burn |
| buy |
| call |
| comes |
| dalloway |
| first |
| flowers |
| happened |
| herself |
| invisible |
| less |
| love |
| man |
+------------+
15 rows in set (0.00 sec)
注
- 通过查询信息架构INNODB_FT_INDEX_TABLE表,验证指定的停止词('Ismael')是否未出现
- 默认情况下,长度小于 3 个字符或长度大于 84 个字符的单词不会出现在 InnoDB 全文搜索索引中。可以使用 innodb_ft_max_token_size 和 innodb_ft_min_token_size 变量配置最大和最小字长值。ngram 标记大小由 ngram_token_size 选项定义。
MyISAM 搜索索引的停止词
- 如果 character_set_server 是 ucs2、utf16、utf16le 或 utf32,则使用 latin1 加载和搜索停止词文件。
- 要覆盖 MyISAM 表的默认停止词列表,请设置 ft_stopword_file 系统变量。变量值应该是包含停止词列表的文件的路径名,或者是空字符串以禁用停止词过滤。服务器在数据目录中查找文件,除非给出绝对路径名以指定不同的目录。更改此变量的值或停止词文件的内容后,重新启动服务器并重建 FULLTEXT 索引。
- 停止词列表是自由形式的,用任何非字母数字字符(例如换行符、空格或逗号)分隔停止词。下划线字符 (_) 和单引号 (') 除外,它们被视为单词的一部分。停止词列表的字符集是服务器默认的字符集;
- 以下列表显示了 MyISAM 搜索索引的默认停用词。在 MySQL 源代码分发中,您可以在 storage/myisam/ft_static.c 文件中找到此列表。
accordingly |
across |
actually |
after |
afterwards |
again |
against |
ain't |
all |
allow |
allows |
almost |
alone |
along |
already |
also |
although |
always |
am |
among |
amongst |
an |
and |
another |
any |
anybody |
anyhow |
anyone |
anything |
anyway |
anyways |
anywhere |
apart |
appear |
appreciate |
appropriate |
are |
aren't |
around |
as |
aside |
ask |
asking |
associated |
at |
available |
away |
awfully |
be |
became |
because |
become |
becomes |
becoming |
been |
before |
beforehand |
behind |
being |
believe |
below |
beside |
besides |
best |
better |
between |
beyond |
both |
brief |
but |
by |
c'mon |
c's |
came |
can |
can't |
cannot |
cant |
cause |
causes |
certain |
certainly |
changes |
clearly |
co |
com |
come |
comes |
concerning |
consequently |
consider |
considering |
contain |
containing |
contains |
corresponding |
could |
couldn't |
course |
currently |
definitely |
described |
despite |
did |
didn't |
different |
do |
does |
doesn't |
doing |
don't |
done |
down |
downwards |
during |
each |
edu |
eg |
eight |
either |
else |
elsewhere |
enough |
entirely |
especially |
et |
etc |
even |
ever |
every |
everybody |
everyone |
everything |
everywhere |
ex |
exactly |
example |
except |
far |
few |
fifth |
first |
five |
followed |
following |
follows |
for |
former |
formerly |
forth |
four |
from |
further |
furthermore |
get |
gets |
getting |
given |
gives |
go |
goes |
going |
gone |
got |
gotten |
greetings |
had |
hadn't |
happens |
hardly |
has |
hasn't |
have |
haven't |
having |
he |
he's |
hello |
help |
hence |
her |
here |
here's |
hereafter |
hereby |
herein |
hereupon |
hers |
herself |
hi |
him |
himself |
his |
hither |
hopefully |
how |
howbeit |
however |
i'd |
i'll |
i'm |
i've |
ie |
if |
ignored |
immediate |
in |
inasmuch |
inc |
indeed |
indicate |
indicated |
indicates |
inner |
insofar |
instead |
into |
inward |
is |
isn't |
it |
it'd |
it'll |
it's |
its |
itself |
just |
keep |
keeps |
kept |
know |
known |
knows |
last |
lately |
later |
latter |
latterly |
least |
less |
lest |
let |
let's |
like |
liked |
likely |
little |
look |
looking |
looks |
ltd |
mainly |
many |
may |
maybe |
me |
mean |
meanwhile |
merely |
might |
more |
moreover |
most |
mostly |
much |
must |
my |
myself |
name |
namely |
nd |
near |
nearly |
necessary |
need |
needs |
neither |
never |
nevertheless |
new |
next |
nine |
no |
nobody |
non |
none |
noone |
nor |
normally |
not |
nothing |
novel |
now |
nowhere |
obviously |
of |
off |
often |
oh |
ok |
okay |
old |
on |
once |
one |
ones |
only |
onto |
or |
other |
others |
otherwise |
ought |
our |
ours |
ourselves |
out |
outside |
over |
overall |
own |
particular |
particularly |
per |
perhaps |
placed |
please |
plus |
possible |
presumably |
probably |
provides |
que |
quite |
qv |
rather |
rd |
re |
really |
reasonably |
regarding |
regardless |
regards |
relatively |
respectively |
right |
said |
same |
saw |
say |
saying |
says |
second |
secondly |
see |
seeing |
seem |
seemed |
seeming |
seems |
seen |
self |
selves |
sensible |
sent |
serious |
seriously |
seven |
several |
shall |
she |
should |
shouldn't |
since |
six |
so |
some |
somebody |
somehow |
someone |
something |
sometime |
sometimes |
somewhat |
somewhere |
soon |
sorry |
specified |
specify |
specifying |
still |
sub |
such |
sup |
sure |
t's |
take |
taken |
tell |
tends |
th |
than |
thank |
thanks |
thanx |
that |
that's |
thats |
the |
their |
theirs |
them |
themselves |
then |
thence |
there |
there's |
thereafter |
thereby |
therefore |
therein |
theres |
thereupon |
these |
they |
they'd |
they'll |
they're |
they've |
think |
third |
this |
thorough |
thoroughly |
those |
though |
three |
through |
throughout |
thru |
thus |
to |
together |
too |
took |
toward |
towards |
tried |
tries |
truly |
try |
trying |
twice |
two |
un |
under |
unfortunately |
unless |
unlikely |
until |
unto |
up |
upon |
us |
use |
used |
useful |
uses |
using |
usually |
value |
various |
very |
via |
viz |
vs |
want |
wants |
was |
wasn't |
way |
we |
we'd |
we'll |
we're |
we've |
welcome |
well |
went |
were |
weren't |
what |
what's |
whatever |
when |
whence |
whenever |
where |
where's |
whereafter |
whereas |
whereby |
wherein |
whereupon |
wherever |
whether |
which |
while |
whither |
who |
who's |
whoever |
whole |
whom |
whose |
why |
will |
willing |
wish |
with |
within |
without |
won't |
wonder |
would |
wouldn't |
yes |
yet |
you |
you'd |
you'll |
you're |
you've |
your |
yours |
yourself |
yourselves |
zero |
|
|
来自 <https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html>