Meet Lucid Fusion
https://docs.lucidworks.com/display/fusion/Getting+Started
http://zh.hortonworks.com/partner/lucidworks/
https://lucidworks.com/blog/noob-notes-fusion-first-look/
[email protected]:~/soft/lucid-fusion/bin$ ./fusion start
2015-01-07 07:16:46Z Starting Fusion Solr on port 8983
2015-01-07 07:17:16Z Starting Fusion API Services on port 8765
2015-01-07 07:17:21Z Starting Fusion UI on port 8764
2015-01-07 07:17:26Z Starting Fusion Connectors on port 8984
访问: http://localhost:8764, 用户名/密码: admin/password123
点击Admin, 创建一个Collection
点击Actions>DataSource, 选择下拉框Web选项. The form will prompt you to enter a name for this datasource, perhaps "Lucidworks" or similar. The index pipeline id has already been supplied, and you can leave that default for now. Under Properties, then Start Links, click the add item button and enter ‘http://lucidworks.com‘. Click Add datasource to save your changes.
点击Start,等待状态变为Finished
点击左侧的Search, 验证索引的数量=1188, 和上面的documents数量一致
在Collections首页可以看到所有索引集合的概览. 包括文档数量, 索引大小, 热门查询等
回到 http://localhost:8764/search, 指定集合, 查询所有文档
点开上面的链接, 随便在其中找个单词, 然后在lucid中查询. 会在索引的Content字段中高亮显示.
为了验证检索出的索引中确实有这个关键词, 打开红色的链接页面:
https://docs.lucidworks.com/display/fusion/Users+and+Roles 可以发现这个页面确实由我们检索的关键词rest
https://docs.lucidworks.com/display/fusion/Crawling+Websites
Index Document(PDF+JSON)
curl -u admin:password123 -X POST -H "Content-Type: application/pdf"
--data-binary @/home/hadoop/Documents/ML/deeplearning.pdf
http://localhost:8764/api/apollo/index-pipelines/conn_solr/collections/docs/index
上面的命令是连续的. 要指定username和password, 即登陆fusion的用户名和密码. 指定-H表示Header
--data-binary指定文件路径, 注意前面的@. 最后是固定的url, 注意最后的docs表示collection, 必须已经存在.
执行上述命令, 会将pdf文档中的内容全部加入lucene的索引中.
http://localhost:8764/search/docs?profile=default&q=*&sf=score&sd=desc
可以搜索文档中的某个词来验证, 比如打开pdf文档, 找到classifying单词CLASSIFYING
验证在fusion中能够搜索到这个文档
curl -u admin:password123 -X POST -H "Content-Type: application/vnd.lucidworks-document" -d ‘[{"id":"myDoc1", "fields":[{"name":"title", "value":"My first document"}, {"name":"body", "value":"This is a simple document."}]}, {"id":"myDoc2", "fields":[{"name":"title", "value":"My second document"}, {"name":"body", "value":"This is another simple document."}]}]‘ http://localhost:8764/api/apollo/index-pipelines/conn_solr/collections/docs/index