没有太多时间进行全文翻译,就把重点挑出来,用自己的话串起来,名曰 选择性翻译。 以后可能会比较多的采用这种方式。
社会书签的tag存储一直是一个比较麻烦的问题。
一个好的数据表设计,不但要能准确查出tag,还应该支持tag的AND/OR/NOT查询。我们来看看解决方案。
“MySQLicious” solution
表结构
存储实例
Intersection (AND)
“search+webservice+semweb”类的查询:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags LIKE "%semweb%"
Union (OR)
“search|webservice|semweb”类的查询:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
OR tags LIKE "%webservice%"
OR tags LIKE "%semweb%"
Minus
“search+webservice-semweb”类的查询
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags NOT LIKE "%semweb%"
优点:
只有一个表 SQL比较直接
可以用mysql的全文检索来做,效率更高
缺点:
tag的数量受到限制,通常我们都用varchar,这种字段只256个字节长。否则,你需要用text类型,速度会变慢。(Easy注,PHP(PHP培训 php教程 )more的tag用的就是TinyText)
Like ‘%things%’不精确,当然某些应用中,这反而是需要的
“Scuttle” solution
数据表
Intersection (AND)
Query for “bookmark+webservice+semweb”:
SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId
HAVING COUNT( b.bId )=3
首先,所有书签-tag组合被搜出来 (c.category IN (‘bookmark‘, ‘webservice‘, ‘semweb‘)), ,然后选择其中包含三个的(HAVING COUNT(b.bId)=3)
Union (OR)
Query for “bookmark|webservice|semweb”:
只需要去掉?AND查询中的HAVING子句:
SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
SELECT b. *
FROM scBookmarks b, scCategories c
WHERE b.bId = c.bId
AND (c.category IN (’bookmark’, ‘webservice’))
AND b.bId NOT
IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = ’semweb’)
GROUP BY b.bId
HAVING COUNT( b.bId ) =2
好处: 我觉得这个方案比前一个方案好的最大理由是,可以有无限个tag。
“Toxi” solution
数据表
Intersection (AND)
Query for “bookmark+webservice+semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id
HAVING COUNT( b.id )=3
Union (OR)
Query for “bookmark|webservice|semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
SELECT b. *
FROM bookmark b, tagmap bt, tag t
WHERE b.id = bt.bookmark_id
AND bt.tag_id = t.tag_id
AND (t.name IN (’Programming’, ‘Algorithms’))
AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND t.name = ‘Python’)
GROUP BY b.id
HAVING COUNT( b.id ) =2
Leaving out theHAVING COUNTleads to the Query for “bookmark|webservice-semweb”.
好处:
你可以给每个tag添加额外的信息
这是最规范的方案,第三范式。
坏处:
删除tag时,你要从多个表中删除(Easy注,Mysql5的话,可以用trigger来做)
然后我们把视线从功能转移到性能上。
A+B
250个tag
999个tag
A OR B
250个tag
添加速度比较
测试代码下载?Download the source code (PHP) LGPL协议。