顾名思义,most_field就是匹配词干的字段数越多,分数越高,也可设置权重boost。
下面是简易公式(详细评分算法请参考:http://m.blog.csdn.net/article/details?id=50623948):
score=match_field1_score*boost+match_field2_score*boost+...match_fieldN_score*boost。
在很多情况下,这种搜索很有效,但存在一个弱点,就是当文档中的字段冗余信息过多,将会影响那些文档比较精炼,而且意思较为全面的分值,
不能使用operator和minimum_should_match来减少相关性低的doc的长尾问题,简单的来说就是按term匹配的个数取胜
例下:
搜索关键字“北京东路”,先下面的分词结果,我们知道它的词干为“北京”与“东路”:
curl ‘localhost:9200/fullbiz_index/_analyze?analyzer=ik_smart&pretty=true‘ -d ‘{"text":"北京东路"}‘ { "tokens" : [ { "token" : "text", "start_offset" : 2, "end_offset" : 6, "type" : "ENGLISH", "position" : 1 }, { "token" : "北京", "start_offset" : 9, "end_offset" : 11, "type" : "CN_WORD", "position" : 2 }, { "token" : "东路", "start_offset" : 11, "end_offset" : 13, "type" : "CN_WORD", "position" : 3 } ] } |
curl ‘localhost:9200/fullbiz1/fullbizinfo/_search?pretty‘ -d ‘ { "from" : 0, "size" : 20, "query" : { "multi_match" : { "query" : "北京东路", "fields" : [ "title", "highlight", "tags", "address", "businessDistrict", "cuisineStyle" ], "type" : "most_fields", "minimum_should_match" : "70%",//这是指最少匹配词干占比,例如三个词干,只要配置了二个以上就算match,66.6%会啥入70%。二个词干或以下,只要匹配了一个就行。所以“北京东路”只要匹配了“北京”或“东路”都可得分 "analyzer" : "ik_smart" //ik有二种模式,一种是ik_max_word(最细词干法),ik_smart(最粗词干法),这里我们配置第二种,以更接近于业务结果。 } }, "post_filter" : { "bool" : { "must" : [ { "term" : { "status" : 0 } }, { "term" : { "hostDisplay" : 1 } }, { "term" : { "cityId" : 2 } }, { "term" : { "productType" : 3 } } ] } } }‘ "hits" : [ { "_index" : "fullbiz1", "_type" : "fullbizinfo", "_id" : "324239", "_score" : 0.33371, "_source":{"boost":1,"productId":24239,"productType":3,"subType":2,"title":"城市公牛(南京东路店)","viceTitle":"城市公牛(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"meal/2016/08/11/1470892987880.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":null,"status":0,"isFree":-1,"duration":"10:00:00-22:30:00","onlineTime":1470280723,"updateTime":1486951326,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":24239,"contactNumber":"13764741956","hostName":"城市公牛(南京东路店)","address":"南京东路300号L221-222室(河南中路口)","hostDisplay":1,"hostPicUrl":"meal/2016/08/11/1470892987880.jpg","hostSharePicUrl":"meal/2016/08/11/1470892987880.jpg","hostLatitude":"31.243455970586","hostLongitude":"121.49099099941","location":{"lat":"31.243455970586","lon":"121.49099099941"},"hostLatitudeGD":"31.237701","hostLongitudeGD":"121.484409","locationGD":{"lat":"31.237701","lon":"121.484409"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":0,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":1,"orderNums":3,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":16000,"hostProductLabelIds":",1,2,4,5,7,8,9,12,13,14,15,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"2010年世博会加拿大馆特约餐厅\",\"加拿大简约西部乡村风格小酒馆餐厅\",\"家庭式的用餐氛围 80%均是外国食客\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-13T10:02:06.000+08:00"} }, { "_index" : "fullbiz1", "_type" : "fullbizinfo", "_id" : "392659", "_score" : 0.31962717, "_source":{"boost":1,"productId":92659,"productType":3,"subType":4,"title":"THAIBEAUTY美容连锁机构(南京东路店)","viceTitle":"THAIBEAUTY美容连锁机构(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2017/01/11/1484121279773528.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1484121281,"updateTime":1484202471,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":92659,"contactNumber":"021-63511876","hostName":"THAIBEAUTY美容连锁机构(南京东路店)","address":"南京东路580号6楼","hostDisplay":1,"hostPicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostSharePicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostLatitude":"31.241721400027","hostLongitude":"121.48585125776","location":{"lat":"31.241721400027","lon":"121.48585125776"},"hostLatitudeGD":"31.235887","hostLongitudeGD":"121.479289","locationGD":{"lat":"31.235887","lon":"121.479289"},"headPics":"","catalogIds":null,"cuisineStyleId":0,"cuisineStyle":"美容/SPA","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":284500,"hostProductLabelIds":",60,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"高端局部瘦身\",\"环境舒适 按摩师手法专业\",\"使用高品质产品\"]","isSeatBook":1,"lastUTCTimestamp":"2017-01-12T14:27:51.000+08:00"} }, { "_index" : "fullbiz1", "_type" : "fullbizinfo", "_id" : "364804", "_score" : 0.31002828, "_source":{"boost":1,"productId":64804,"productType":3,"subType":2,"title":"斗牛士(南京东路店)","viceTitle":"斗牛士(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482718008927949.png","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482718014,"updateTime":1486569730,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":64804,"contactNumber":"021-33317136","hostName":"斗牛士(南京东路店)","address":"南京东路353号悦荟广场(原353店)7F","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostSharePicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostLatitude":"31.24210523683","hostLongitude":"121.49020262932","location":{"lat":"31.24210523683","lon":"121.49020262932"},"hostLatitudeGD":"31.236339","hostLongitudeGD":"121.483623","locationGD":{"lat":"31.236339","lon":"121.483623"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":12200,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"精选进口澳洲安格斯牛排\",\"严控0度低温 保证牛肉鲜嫩\",\"进口原切牛排保证牛肉口感与外观\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-09T00:02:10.000+08:00"} ..... "_index" : "fullbiz1", "_type" : "fullbizinfo", "_id" : "353771", "_score" : 0.7784657, "_source":{"boost":1,"productId":53771,"productType":3,"subType":2,"title":"九储堂创意中国菜(外滩店)","viceTitle":"九储堂创意中国菜(外滩店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482744127546461.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482744132,"updateTime":1486738928,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"外滩","businessDistrictId":71,"hostId":53771,"contactNumber":"021-63308900","hostName":"九储堂创意中国菜(外滩店)","address":"北京东路398号新协通国际大酒店18楼","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostSharePicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostLatitude":"31.246247363994","hostLongitude":"121.48894308136","location":{"lat":"31.246247363994","lon":"121.48894308136"},"hostLatitudeGD":"31.240463","hostLongitudeGD":"121.48237","locationGD":{"lat":"31.240463","lon":"121.48237"},"headPics":"","catalogIds":null,"cuisineStyleId":25,"cuisineStyle":"创意菜","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":19100,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"[\"新加坡同乐餐饮总厨胡于保先生主理\",\"大厅可容纳150人的宴会 包房5间\",\"靠窗座位亦可欣赏浦江两岸美景\"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-10T23:02:08.000+08:00"} |
而结果中有包含“北京东路”完整内容的文档却排在后面,这不科学,为什么会是这个结果,下面我们经过explain来看看评分计算:
curl ‘localhost:9200/fullbiz1/fullbizinfo/_search?pretty&explain‘ ....后面内容省略,和上面的请求是一样,只加了一个explain,以及size限制第一条,因为信息太多,只分析具体一个文档,下面我们直接看评分部分:
"_explanation" : { "value" : 0.33371, "description" : "product of:", "details" : [ { "value" : 0.66742, "description" : "sum of:", "details" : [ { "value" : 0.28481156, "description" : "product of:", "details" : [ { "value" : 0.5696231, "description" : "sum of:", "details" : [ { "value" : 0.5696231, "description" : "weight(title:东路 in 7321) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.5696231, "description" : "score(doc=7321,freq=1.0), product of:", "details" : [ { "value" : 0.25448462, "description" : "queryWeight, product of:", "details" : [ { "value" : 7.1626873, "description" : "idf(docFreq=244, maxDocs=116302)" }, { "value" : 0.03552921, "description" : "queryNorm" } ] }, { "value" : 2.23834, "description" : "fieldWeight in 7321, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 7.1626873, "description" : "idf(docFreq=244, maxDocs=116302)" }, { "value" : 0.3125, "description" : "fieldNorm(doc=7321)" } ] } ] } ] } ] }, { "value" : 0.5, "description" : "coord(1/2)" } ] }, { "value" : 0.067192085, "description" : "product of:", "details" : [ { "value" : 0.13438417, "description" : "sum of:", "details" : [ { "value" : 0.13438417, "description" : "weight(address:东路 in 7321) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.13438417, "description" : "score(doc=7321,freq=1.0), product of:", "details" : [ { "value" : 0.1477382, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.158218, "description" : "idf(docFreq=4942, maxDocs=116302)" }, { "value" : 0.03552921, "description" : "queryNorm" } ] }, { "value" : 0.90961015, "description" : "fieldWeight in 7321, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 4.158218, "description" : "idf(docFreq=4942, maxDocs=116302)" }, { "value" : 0.21875, "description" : "fieldNorm(doc=7321)" } ] } ] } ] } ] }, { "value" : 0.5, "description" : "coord(1/2)" } ] }, { "value" : 0.3154164, "description" : "product of:", "details" : [ { "value" : 0.6308328, "description" : "sum of:", "details" : [ { "value" : 0.6308328, "description" : "weight(businessDistrict:东路 in 7321) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.6308328, "description" : "score(doc=7321,freq=1.0), product of:", "details" : [ { "value" : 0.22633977, "description" : "queryWeight, product of:", "details" : [ { "value" : 6.3705263, "description" : "idf(docFreq=540, maxDocs=116302)" }, { "value" : 0.03552921, "description" : "queryNorm" } ] }, { "value" : 2.7871053, "description" : "fieldWeight in 7321, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 6.3705263, "description" : "idf(docFreq=540, maxDocs=116302)" }, { "value" : 0.4375, "description" : "fieldNorm(doc=7321)" } ] } ] } ] } ] }, { "value" : 0.5, "description" : "coord(1/2)" } ] } ] }, { "value" : 0.5, "description" : "coord(3/6)" } ] } } ] } } |
从上面分析结果来看,排在前面的这些包含“南京东路”的文档,不是因为匹配度高,而是因为匹配的字段多,所以得分大于下面那个只包含一个“北京东路”字段的文档。
总结:most_field适应于那种字段之间信息差异较大的搜索匹配,像上面那种title中有“东路”,商圈、地址中也有“东路“,冗余信息较多。
时间: 2024-10-28 14:43:17