全文检索的配置 / 憋错料

全文检索不同于特定字段的模糊查询，使用全文检索的效率再高，并且能够对于中文进行分词处理。

haystack:全文检索框架，支持whoosh、solr、Xaplan、Elasticsearc四种全文检索引擎
whoosh:纯python编写的全文搜索引擎，虽然性能比不上sphinx、xapian、elasticsearc等，但是无二进制包，程序不会莫名其妙的崩溃，对于小型的站点，whoosh已经足够使用，
jieba:中文分词包

安装包：

pip install django-haystackpip install whooshpip install jieba

修改settings文件

INSTALLWS_APPS = [    ‘haystack‘,  # 全文检索框架]?? 全文检索框架的配置HAYSTACK_CONNECTIONS = {    ‘default‘: {        # 使用whoosh搜索引擎        ‘ENGINE‘: ‘goods.whoosh_cn_backend.WhooshEngine‘,        # 索引文件的路径        ‘PATH‘: os.path.join(BASE_DIR, ‘whoosh_index‘),    },}?# 当添加、修改、删除数据时，自动生成索引HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor‘?# 设置每页显示的数目，默认为20，可以自己修改  HAYSTACK_SEARCH_RESULTS_PER_PAGE  =  8

在要检索的app下面，创建search_indexes.py文件

# 定义索引类from haystack import indexesfrom .models import Goods??# 索引类名格式：模型类名+Indexclass GoodsIndex(indexes.SearchIndex, indexes.Indexable):    """?    """    # 索引字段：use_template 指定根据表中的哪些字段 建立索引文件    # 把说明放在一个文件中    text = indexes.CharField(document=True, use_template=True)        # 建立检索字段，model_attr模型属性，如果需要多字段的话，在这里添加需要检索的字段    goods_name = indexes.NgramField(model_attr="goods_name")  ?    def get_model(self):        return Goods  # 返回的模型类?    def index_queryset(self, using=None):        return self.get_model().objects.all()?

在templates下面创建如下文件夹 search/indexes/goods,在这下面创建goods_text.txt（goods是需要检索的模型类的小写）

注：名称是固定的，不可随意更改

在goods_test.txt里面写入需要检索的字段

# 指定根据表中的哪些字段建立索引数据{{object.goods_name}}  # 根据商品的名称建立索引?

进入到项目所在的目录，创建索引文件：python manage.py rebuild_index

html下搜索框固定设置

<form action="./search" method="get"><--form的method必须为get-->       {% csrf_token %}    <input type="text" placeholder="搜索品牌 店铺..." name="q">  <--inpu的name必须为q--><input type="submit"  value="搜索" ></form>

配置项目下的urls

from django.urls import path, include?urlpatterns = [    path(‘search/‘, include(‘haystack.urls‘))  # 全文检过框架]

搜索出来的结果，haystack会把搜索结果传递给templates/search目录下的search.html，所以需要在templates的search文件夹下创建search.html文件。传递的上下文包括：

query: 搜索的关键字

page:当前页的page对象

serchResult类的实例对象，对象的属性是object

paginator：分页paginator对象

# 设置每页显示的数目，默认为20，可以自己修改  HAYSTACK_SEARCH_RESULTS_PER_PAGE  =  8

配置中文分词器，这里用到的模块为jieba,文件名为：tokenizer.py,把本文件放在与search_indexes.py同目录下，我这里放在了goods文件夹下

from jieba import cut_for_searchfrom whoosh.analysis import Tokenizer, Token??class ChineseTokenizer(Tokenizer):    def __call__(self, value, positions=False, chars=False,                 keeporiginal=False, removestops=True,                 start_pos=0, start_char=0, mode=‘‘, **kwargs):?        t = Token(positions, chars, removestops=removestops, mode=mode,                  **kwargs)        # seglist = cut(value, cut_all=False)  # (精确模式)使用结巴分词库进行分词        seglist = cut_for_search(value)  # (搜索引擎模式) 使用结巴分词库进行分词        for w in seglist:            t.original = t.text = w            t.boost = 1.0            if positions:                t.pos = start_pos + value.find(w)            if chars:                t.startchar = start_char + value.find(w)                t.endchar = start_char + value.find(w) + len(w)            yield t  # 通过生成器返回每个分词的结果token??def ChineseAnalyzer():    return ChineseTokenizer()?

中文搜索引擎配置,文件名：whoosh_cn_backend.py，把本文件放在与search_indexes.py同目录下，我这里放在了goods文件夹下,修改172行的app名

# encoding: utf-8?from __future__ import absolute_import, division, print_function, unicode_literals?import jsonimport osimport reimport shutilimport threadingimport warnings?from django.conf import settingsfrom django.core.exceptions import ImproperlyConfiguredfrom django.utils import sixfrom django.utils.datetime_safe import datetimefrom django.utils.encoding import force_text?from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, EmptyResults, log_queryfrom haystack.constants import DJANGO_CT, DJANGO_ID, IDfrom haystack.exceptions import MissingDependency, SearchBackendError, SkipDocumentfrom haystack.inputs import Clean, Exact, PythonData, Rawfrom haystack.models import SearchResultfrom haystack.utils import log as loggingfrom haystack.utils import get_identifier, get_model_ctfrom haystack.utils.app_loading import haystack_get_model?try:    import whooshexcept ImportError:    raise MissingDependency(        "The ‘whoosh‘ backend requires the installation of ‘Whoosh‘. Please refer to the documentation.")?# Handle minimum requirement.if not hasattr(whoosh, ‘__version__‘) or whoosh.__version__ < (2, 5, 0):    raise MissingDependency("The ‘whoosh‘ backend requires version 2.5.0 or greater.")?# Bubble up the correct error.from whoosh import indexfrom whoosh.analysis import StemmingAnalyzerfrom whoosh.fields import ID as WHOOSH_IDfrom whoosh.fields import BOOLEAN, DATETIME, IDLIST, KEYWORD, NGRAM, NGRAMWORDS, NUMERIC, Schema, TEXTfrom whoosh.filedb.filestore import FileStorage, RamStoragefrom whoosh.highlight import highlight as whoosh_highlightfrom whoosh.highlight import ContextFragmenter, HtmlFormatterfrom whoosh.qparser import QueryParserfrom whoosh.searching import ResultsPagefrom whoosh.writing import AsyncWriter?DATETIME_REGEX = re.compile(    ‘^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})T(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(\.\d{3,6}Z?)?$‘)LOCALS = threading.local()LOCALS.RAM_STORE = None??class WhooshHtmlFormatter(HtmlFormatter):    """    This is a HtmlFormatter simpler than the whoosh.HtmlFormatter.    We use it to have consistent results across backends. Specifically,    Solr, Xapian and Elasticsearch are using this formatting.    """    template = ‘<%(tag)s>%(t)s</%(tag)s>‘??class WhooshSearchBackend(BaseSearchBackend):    # Word reserved by Whoosh for special use.    RESERVED_WORDS = (        ‘AND‘,        ‘NOT‘,        ‘OR‘,        ‘TO‘,    )?    # Characters reserved by Whoosh for special use.    # The ‘\\‘ must come first, so as not to overwrite the other slash replacements.    RESERVED_CHARACTERS = (        ‘\\‘, ‘+‘, ‘-‘, ‘&&‘, ‘||‘, ‘!‘, ‘(‘, ‘)‘, ‘{‘, ‘}‘,        ‘[‘, ‘]‘, ‘^‘, ‘"‘, ‘~‘, ‘*‘, ‘?‘, ‘:‘, ‘.‘,    )?    def __init__(self, connection_alias, **connection_options):        super(WhooshSearchBackend, self).__init__(connection_alias, **connection_options)        self.setup_complete = False        self.use_file_storage = True        self.post_limit = getattr(connection_options, ‘POST_LIMIT‘, 128 * 1024 * 1024)        self.path = connection_options.get(‘PATH‘)?        if connection_options.get(‘STORAGE‘, ‘file‘) != ‘file‘:            self.use_file_storage = False?        if self.use_file_storage and not self.path:            raise ImproperlyConfigured(                "You must specify a ‘PATH‘ in your settings for connection ‘%s‘." % connection_alias)?        self.log = logging.getLogger(‘haystack‘)?    def setup(self):        """        Defers loading until needed.        """        from haystack import connections        new_index = False?        # Make sure the index is there.        if self.use_file_storage and not os.path.exists(self.path):            os.makedirs(self.path)            new_index = True?        if self.use_file_storage and not os.access(self.path, os.W_OK):            raise IOError("The path to your Whoosh index ‘%s‘ is not writable for the current user/group." % self.path)?        if self.use_file_storage:            self.storage = FileStorage(self.path)        else:            global LOCALS?            if getattr(LOCALS, ‘RAM_STORE‘, None) is None:                LOCALS.RAM_STORE = RamStorage()?            self.storage = LOCALS.RAM_STORE?        self.content_field_name, self.schema = self.build_schema(            connections[self.connection_alias].get_unified_index().all_searchfields())        self.parser = QueryParser(self.content_field_name, schema=self.schema)?        if new_index is True:            self.index = self.storage.create_index(self.schema)        else:            try:                self.index = self.storage.open_index(schema=self.schema)            except index.EmptyIndexError:                self.index = self.storage.create_index(self.schema)?        self.setup_complete = True?    def build_schema(self, fields):        schema_fields = {            ID: WHOOSH_ID(stored=True, unique=True),            DJANGO_CT: WHOOSH_ID(stored=True),            DJANGO_ID: WHOOSH_ID(stored=True),        }        # Grab the number of keys that are hard-coded into Haystack.        # We‘ll use this to (possibly) fail slightly more gracefully later.        initial_key_count = len(schema_fields)        content_field_name = ‘‘?        for field_name, field_class in fields.items():            if field_class.is_multivalued:                if field_class.indexed is False:                    schema_fields[field_class.index_fieldname] = IDLIST(stored=True, field_boost=field_class.boost)                else:                    schema_fields[field_class.index_fieldname] = KEYWORD(stored=True, commas=True, scorable=True,                                                                         field_boost=field_class.boost)            elif field_class.field_type in [‘date‘, ‘datetime‘]:                schema_fields[field_class.index_fieldname] = DATETIME(stored=field_class.stored, sortable=True)            elif field_class.field_type == ‘integer‘:                schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=int,                                                                     field_boost=field_class.boost)            elif field_class.field_type == ‘float‘:                schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=float,                                                                     field_boost=field_class.boost)            elif field_class.field_type == ‘boolean‘:                # Field boost isn‘t supported on BOOLEAN as of 1.8.2.                schema_fields[field_class.index_fieldname] = BOOLEAN(stored=field_class.stored)            elif field_class.field_type == ‘ngram‘:                schema_fields[field_class.index_fieldname] = NGRAM(minsize=3, maxsize=15, stored=field_class.stored,                                                                   field_boost=field_class.boost)            elif field_class.field_type == ‘edge_ngram‘:                schema_fields[field_class.index_fieldname] = NGRAMWORDS(minsize=2, maxsize=15, at=‘start‘,                                                                        stored=field_class.stored,                                                                        field_boost=field_class.boost)            else:                from goods.tokenizer import ChineseAnalyzer                schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyzer(),                                                                  field_boost=field_class.boost, sortable=True)            if field_class.document is True:                content_field_name = field_class.index_fieldname                schema_fields[field_class.index_fieldname].spelling = True?        # Fail more gracefully than relying on the backend to die if no fields        # are found.        if len(schema_fields) <= initial_key_count:            raise SearchBackendError(                "No fields were found in any search_indexes. Please correct this before attempting to search.")?        return (content_field_name, Schema(**schema_fields))?    def update(self, index, iterable, commit=True):        if not self.setup_complete:            self.setup()?        self.index = self.index.refresh()        writer = AsyncWriter(self.index)?        for obj in iterable:            try:                doc = index.full_prepare(obj)            except SkipDocument:                self.log.debug(u"Indexing for object `%s` skipped", obj)            else:                # Really make sure it‘s unicode, because Whoosh won‘t have it any                # other way.                for key in doc:                    doc[key] = self._from_python(doc[key])?                # Document boosts aren‘t supported in Whoosh 2.5.0+.                if ‘boost‘ in doc:                    del doc[‘boost‘]?                try:                    writer.update_document(**doc)                except Exception as e:                    if not self.silently_fail:                        raise?                    # We‘ll log the object identifier but won‘t include the actual object                    # to avoid the possibility of that generating encoding errors while                    # processing the log message:                    self.log.error(u"%s while preparing object for update" % e.__class__.__name__,                                   exc_info=True, extra={"data": {"index": index,                                                                  "object": get_identifier(obj)}})?        if len(iterable) > 0:            # For now, commit no matter what, as we run into locking issues otherwise.            writer.commit()?    def remove(self, obj_or_string, commit=True):        if not self.setup_complete:            self.setup()?        self.index = self.index.refresh()        whoosh_id = get_identifier(obj_or_string)?        try:            self.index.delete_by_query(q=self.parser.parse(u‘%s:"%s"‘ % (ID, whoosh_id)))        except Exception as e:            if not self.silently_fail:                raise?            self.log.error("Failed to remove document ‘%s‘ from Whoosh: %s", whoosh_id, e, exc_info=True)?    def clear(self, models=None, commit=True):        if not self.setup_complete:            self.setup()?        self.index = self.index.refresh()?        if models is not None:            assert isinstance(models, (list, tuple))?        try:            if models is None:                self.delete_index()            else:                models_to_delete = []?                for model in models:                    models_to_delete.append(u"%s:%s" % (DJANGO_CT, get_model_ct(model)))?                self.index.delete_by_query(q=self.parser.parse(u" OR ".join(models_to_delete)))        except Exception as e:            if not self.silently_fail:                raise?            if models is not None:                self.log.error("Failed to clear Whoosh index of models ‘%s‘: %s", ‘,‘.join(models_to_delete),                               e, exc_info=True)            else:                self.log.error("Failed to clear Whoosh index: %s", e, exc_info=True)?    def delete_index(self):        # Per the Whoosh mailing list, if wiping out everything from the index,        # it‘s much more efficient to simply delete the index files.        if self.use_file_storage and os.path.exists(self.path):            shutil.rmtree(self.path)        elif not self.use_file_storage:            self.storage.clean()?        # Recreate everything.        self.setup()?    def optimize(self):        if not self.setup_complete:            self.setup()?        self.index = self.index.refresh()        self.index.optimize()?    def calculate_page(self, start_offset=0, end_offset=None):        # Prevent against Whoosh throwing an error. Requires an end_offset        # greater than 0.        if end_offset is not None and end_offset <= 0:            end_offset = 1?        # Determine the page.        page_num = 0?        if end_offset is None:            end_offset = 1000000?        if start_offset is None:            start_offset = 0?        page_length = end_offset - start_offset?        if page_length and page_length > 0:            page_num = int(start_offset / page_length)?        # Increment because Whoosh uses 1-based page numbers.        page_num += 1        return page_num, page_length?    @log_query    def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,               fields=‘‘, highlight=False, facets=None, date_facets=None, query_facets=None,               narrow_queries=None, spelling_query=None, within=None,               dwithin=None, distance_point=None, models=None,               limit_to_registered_models=None, result_class=None, **kwargs):        if not self.setup_complete:            self.setup()?        # A zero length query should return no results.        if len(query_string) == 0:            return {                ‘results‘: [],                ‘hits‘: 0,            }?        query_string = force_text(query_string)?        # A one-character query (non-wildcard) gets nabbed by a stopwords        # filter and should yield zero results.        if len(query_string) <= 1 and query_string != u‘*‘:            return {                ‘results‘: [],                ‘hits‘: 0,            }?        reverse = False?        if sort_by is not None:            # Determine if we need to reverse the results and if Whoosh can            # handle what it‘s being asked to sort by. Reversing is an            # all-or-nothing action, unfortunately.            sort_by_list = []            reverse_counter = 0?            for order_by in sort_by:                if order_by.startswith(‘-‘):                    reverse_counter += 1?            if reverse_counter and reverse_counter != len(sort_by):                raise SearchBackendError("Whoosh requires all order_by fields"                                         " to use the same sort direction")?            for order_by in sort_by:                if order_by.startswith(‘-‘):                    sort_by_list.append(order_by[1:])?                    if len(sort_by_list) == 1:                        reverse = True                else:                    sort_by_list.append(order_by)?                    if len(sort_by_list) == 1:                        reverse = False?            sort_by = sort_by_list?        if facets is not None:            warnings.warn("Whoosh does not handle faceting.", Warning, stacklevel=2)?        if date_facets is not None:            warnings.warn("Whoosh does not handle date faceting.", Warning, stacklevel=2)?        if query_facets is not None:            warnings.warn("Whoosh does not handle query faceting.", Warning, stacklevel=2)?        narrowed_results = None        self.index = self.index.refresh()?        if limit_to_registered_models is None:            limit_to_registered_models = getattr(settings, ‘HAYSTACK_LIMIT_TO_REGISTERED_MODELS‘, True)?        if models and len(models):            model_choices = sorted(get_model_ct(model) for model in models)        elif limit_to_registered_models:            # Using narrow queries, limit the results to only models handled            # with the current routers.            model_choices = self.build_models_list()        else:            model_choices = []?        if len(model_choices) > 0:            if narrow_queries is None:                narrow_queries = set()?            narrow_queries.add(‘ OR ‘.join([‘%s:%s‘ % (DJANGO_CT, rm) for rm in model_choices]))?        narrow_searcher = None?        if narrow_queries is not None:            # Potentially expensive? I don‘t see another way to do it in Whoosh...            narrow_searcher = self.index.searcher()?            for nq in narrow_queries:                recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),                                                                 limit=None)?                if len(recent_narrowed_results) <= 0:                    return {                        ‘results‘: [],                        ‘hits‘: 0,                    }?                if narrowed_results:                    narrowed_results.filter(recent_narrowed_results)                else:                    narrowed_results = recent_narrowed_results?        self.index = self.index.refresh()?        if self.index.doc_count():            searcher = self.index.searcher()            parsed_query = self.parser.parse(query_string)?            # In the event of an invalid/stopworded query, recover gracefully.            if parsed_query is None:                return {                    ‘results‘: [],                    ‘hits‘: 0,                }?            page_num, page_length = self.calculate_page(start_offset, end_offset)?            search_kwargs = {                ‘pagelen‘: page_length,                ‘sortedby‘: sort_by,                ‘reverse‘: reverse,            }?            # Handle the case where the results have been narrowed.            if narrowed_results is not None:                search_kwargs[‘filter‘] = narrowed_results?            try:                raw_page = searcher.search_page(                    parsed_query,                    page_num,                    **search_kwargs                )            except ValueError:                if not self.silently_fail:                    raise?                return {                    ‘results‘: [],                    ‘hits‘: 0,                    ‘spelling_suggestion‘: None,                }?            # Because as of Whoosh 2.5.1, it will return the wrong page of            # results if you request something too high. :(            if raw_page.pagenum < page_num:                return {                    ‘results‘: [],                    ‘hits‘: 0,                    ‘spelling_suggestion‘: None,                }?            results = self._process_results(raw_page, highlight=highlight, query_string=query_string,                                            spelling_query=spelling_query, result_class=result_class)            searcher.close()?            if hasattr(narrow_searcher, ‘close‘):                narrow_searcher.close()?            return results        else:            if self.include_spelling:                if spelling_query:                    spelling_suggestion = self.create_spelling_suggestion(spelling_query)                else:                    spelling_suggestion = self.create_spelling_suggestion(query_string)            else:                spelling_suggestion = None?            return {                ‘results‘: [],                ‘hits‘: 0,                ‘spelling_suggestion‘: spelling_suggestion,            }?    def more_like_this(self, model_instance, additional_query_string=None,                       start_offset=0, end_offset=None, models=None,                       limit_to_registered_models=None, result_class=None, **kwargs):        if not self.setup_complete:            self.setup()?        field_name = self.content_field_name        narrow_queries = set()        narrowed_results = None        self.index = self.index.refresh()?        if limit_to_registered_models is None:            limit_to_registered_models = getattr(settings, ‘HAYSTACK_LIMIT_TO_REGISTERED_MODELS‘, True)?        if models and len(models):            model_choices = sorted(get_model_ct(model) for model in models)        elif limit_to_registered_models:            # Using narrow queries, limit the results to only models handled            # with the current routers.            model_choices = self.build_models_list()        else:            model_choices = []?        if len(model_choices) > 0:            if narrow_queries is None:                narrow_queries = set()?            narrow_queries.add(‘ OR ‘.join([‘%s:%s‘ % (DJANGO_CT, rm) for rm in model_choices]))?        if additional_query_string and additional_query_string != ‘*‘:            narrow_queries.add(additional_query_string)?        narrow_searcher = None?        if narrow_queries is not None:            # Potentially expensive? I don‘t see another way to do it in Whoosh...            narrow_searcher = self.index.searcher()?            for nq in narrow_queries:                recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),                                                                 limit=None)?                if len(recent_narrowed_results) <= 0:                    return {                        ‘results‘: [],                        ‘hits‘: 0,                    }?                if narrowed_results:                    narrowed_results.filter(recent_narrowed_results)                else:                    narrowed_results = recent_narrowed_results?        page_num, page_length = self.calculate_page(start_offset, end_offset)?        self.index = self.index.refresh()        raw_results = EmptyResults()?        searcher = None        if self.index.doc_count():            query = "%s:%s" % (ID, get_identifier(model_instance))            searcher = self.index.searcher()            parsed_query = self.parser.parse(query)            results = searcher.search(parsed_query)?            if len(results):                raw_results = results[0].more_like_this(field_name, top=end_offset)?            # Handle the case where the results have been narrowed.            if narrowed_results is not None and hasattr(raw_results, ‘filter‘):                raw_results.filter(narrowed_results)?        try:            raw_page = ResultsPage(raw_results, page_num, page_length)        except ValueError:            if not self.silently_fail:                raise?            return {                ‘results‘: [],                ‘hits‘: 0,                ‘spelling_suggestion‘: None,            }?        # Because as of Whoosh 2.5.1, it will return the wrong page of        # results if you request something too high. :(        if raw_page.pagenum < page_num:            return {                ‘results‘: [],                ‘hits‘: 0,                ‘spelling_suggestion‘: None,            }?        results = self._process_results(raw_page, result_class=result_class)?        if searcher:            searcher.close()?        if hasattr(narrow_searcher, ‘close‘):            narrow_searcher.close()?        return results?    def _process_results(self, raw_page, highlight=False, query_string=‘‘, spelling_query=None, result_class=None):        from haystack import connections        results = []?        # It‘s important to grab the hits first before slicing. Otherwise, this        # can cause pagination failures.        hits = len(raw_page)?        if result_class is None:            result_class = SearchResult?        facets = {}        spelling_suggestion = None        unified_index = connections[self.connection_alias].get_unified_index()        indexed_models = unified_index.get_indexed_models()?        for doc_offset, raw_result in enumerate(raw_page):            score = raw_page.score(doc_offset) or 0            app_label, model_name = raw_result[DJANGO_CT].split(‘.‘)            additional_fields = {}            model = haystack_get_model(app_label, model_name)?            if model and model in indexed_models:                for key, value in raw_result.items():                    index = unified_index.get_index(model)                    string_key = str(key)?                    if string_key in index.fields and hasattr(index.fields[string_key], ‘convert‘):                        # Special-cased due to the nature of KEYWORD fields.                        if index.fields[string_key].is_multivalued:                            if value is None or len(value) is 0:                                additional_fields[string_key] = []                            else:                                additional_fields[string_key] = value.split(‘,‘)                        else:                            additional_fields[string_key] = index.fields[string_key].convert(value)                    else:                        additional_fields[string_key] = self._to_python(value)?                del (additional_fields[DJANGO_CT])                del (additional_fields[DJANGO_ID])?                if highlight:                    sa = StemmingAnalyzer()                    formatter = WhooshHtmlFormatter(‘em‘)                    terms = [token.text for token in sa(query_string)]?                    whoosh_result = whoosh_highlight(                        additional_fields.get(self.content_field_name),                        terms,                        sa,                        ContextFragmenter(),                        formatter                    )                    additional_fields[‘highlighted‘] = {                        self.content_field_name: [whoosh_result],                    }?                result = result_class(app_label, model_name, raw_result[DJANGO_ID], score, **additional_fields)                results.append(result)            else:                hits -= 1?        if self.include_spelling:            if spelling_query:                spelling_suggestion = self.create_spelling_suggestion(spelling_query)            else:                spelling_suggestion = self.create_spelling_suggestion(query_string)?        return {            ‘results‘: results,            ‘hits‘: hits,            ‘facets‘: facets,            ‘spelling_suggestion‘: spelling_suggestion,        }?    def create_spelling_suggestion(self, query_string):        spelling_suggestion = None        reader = self.index.reader()        corrector = reader.corrector(self.content_field_name)        cleaned_query = force_text(query_string)?        if not query_string:            return spelling_suggestion?        # Clean the string.        for rev_word in self.RESERVED_WORDS:            cleaned_query = cleaned_query.replace(rev_word, ‘‘)?        for rev_char in self.RESERVED_CHARACTERS:            cleaned_query = cleaned_query.replace(rev_char, ‘‘)?        # Break it down.        query_words = cleaned_query.split()        suggested_words = []?        for word in query_words:            suggestions = corrector.suggest(word, limit=1)?            if len(suggestions) > 0:                suggested_words.append(suggestions[0])?        spelling_suggestion = ‘ ‘.join(suggested_words)        return spelling_suggestion?    def _from_python(self, value):        """        Converts Python values to a string for Whoosh.?        Code courtesy of pysolr.        """        if hasattr(value, ‘strftime‘):            if not hasattr(value, ‘hour‘):                value = datetime(value.year, value.month, value.day, 0, 0, 0)        elif isinstance(value, bool):            if value:                value = ‘true‘            else:                value = ‘false‘        elif isinstance(value, (list, tuple)):            value = u‘,‘.join([force_text(v) for v in value])        elif isinstance(value, (six.integer_types, float)):            # Leave it alone.            pass        else:            value = force_text(value)        return value?    def _to_python(self, value):        """        Converts values from Whoosh to native Python values.?        A port of the same method in pysolr, as they deal with data the same way.        """        if value == ‘true‘:            return True        elif value == ‘false‘:            return False?        if value and isinstance(value, six.string_types):            possible_datetime = DATETIME_REGEX.search(value)?            if possible_datetime:                date_values = possible_datetime.groupdict()?                for dk, dv in date_values.items():                    date_values[dk] = int(dv)?                return datetime(date_values[‘year‘], date_values[‘month‘], date_values[‘day‘], date_values[‘hour‘],                                date_values[‘minute‘], date_values[‘second‘])?        try:            # Attempt to use json to load the values.            converted_value = json.loads(value)?            # Try to handle most built-in types.            if isinstance(converted_value, (list, tuple, set, dict, six.integer_types, float, complex)):                return converted_value        except:            # If it fails (SyntaxError or its ilk) or we don‘t trust it,            # continue on.            pass?        return value??class WhooshSearchQuery(BaseSearchQuery):    def _convert_datetime(self, date):        if hasattr(date, ‘hour‘):            return force_text(date.strftime(‘%Y%m%d%H%M%S‘))        else:            return force_text(date.strftime(‘%Y%m%d000000‘))?    def clean(self, query_fragment):        """        Provides a mechanism for sanitizing user input before presenting the        value to the backend.?        Whoosh 1.X differs here in that you can no longer use a backslash        to escape reserved characters. Instead, the whole word should be        quoted.        """        words = query_fragment.split()        cleaned_words = []?        for word in words:            if word in self.backend.RESERVED_WORDS:                word = word.replace(word, word.lower())?            for char in self.backend.RESERVED_CHARACTERS:                if char in word:                    word = "‘%s‘" % word                    break?            cleaned_words.append(word)?        return ‘ ‘.join(cleaned_words)?    def build_query_fragment(self, field, filter_type, value):        from haystack import connections        query_frag = ‘‘        is_datetime = False?        if not hasattr(value, ‘input_type_name‘):            # Handle when we‘ve got a ``ValuesListQuerySet``...            if hasattr(value, ‘values_list‘):                value = list(value)?            if hasattr(value, ‘strftime‘):                is_datetime = True?            if isinstance(value, six.string_types) and value != ‘ ‘:                # It‘s not an ``InputType``. Assume ``Clean``.                value = Clean(value)            else:                value = PythonData(value)?        # Prepare the query using the InputType.        prepared_value = value.prepare(self)?        if not isinstance(prepared_value, (set, list, tuple)):            # Then convert whatever we get back to what pysolr wants if needed.            prepared_value = self.backend._from_python(prepared_value)?        # ‘content‘ is a special reserved word, much like ‘pk‘ in        # Django‘s ORM layer. It indicates ‘no special field‘.        if field == ‘content‘:            index_fieldname = ‘‘        else:            index_fieldname = u‘%s:‘ % connections[self._using].get_unified_index().get_index_fieldname(field)?        filter_types = {            ‘content‘: ‘%s‘,            ‘contains‘: ‘*%s*‘,            ‘endswith‘: "*%s",            ‘startswith‘: "%s*",            ‘exact‘: ‘%s‘,            ‘gt‘: "{%s to}",            ‘gte‘: "[%s to]",            ‘lt‘: "{to %s}",            ‘lte‘: "[to %s]",            ‘fuzzy‘: u‘%s~‘,        }?        if value.post_process is False:            query_frag = prepared_value        else:            if filter_type in [‘content‘, ‘contains‘, ‘startswith‘, ‘endswith‘, ‘fuzzy‘]:                if value.input_type_name == ‘exact‘:                    query_frag = prepared_value                else:                    # Iterate over terms & incorportate the converted form of each into the query.                    terms = []?                    if isinstance(prepared_value, six.string_types):                        possible_values = prepared_value.split(‘ ‘)                    else:                        if is_datetime is True:                            prepared_value = self._convert_datetime(prepared_value)?                        possible_values = [prepared_value]?                    for possible_value in possible_values:                        terms.append(filter_types[filter_type] % self.backend._from_python(possible_value))?                    if len(terms) == 1:                        query_frag = terms[0]                    else:                        query_frag = u"(%s)" % " AND ".join(terms)            elif filter_type == ‘in‘:                in_options = []?                for possible_value in prepared_value:                    is_datetime = False?                    if hasattr(possible_value, ‘strftime‘):                        is_datetime = True?                    pv = self.backend._from_python(possible_value)?                    if is_datetime is True:                        pv = self._convert_datetime(pv)?                    if isinstance(pv, six.string_types) and not is_datetime:                        in_options.append(‘"%s"‘ % pv)                    else:                        in_options.append(‘%s‘ % pv)?                query_frag = "(%s)" % " OR ".join(in_options)            elif filter_type == ‘range‘:                start = self.backend._from_python(prepared_value[0])                end = self.backend._from_python(prepared_value[1])?                if hasattr(prepared_value[0], ‘strftime‘):                    start = self._convert_datetime(start)?                if hasattr(prepared_value[1], ‘strftime‘):                    end = self._convert_datetime(end)?                query_frag = u"[%s to %s]" % (start, end)            elif filter_type == ‘exact‘:                if value.input_type_name == ‘exact‘:                    query_frag = prepared_value                else:                    prepared_value = Exact(prepared_value).prepare(self)                    query_frag = filter_types[filter_type] % prepared_value            else:                if is_datetime is True:                    prepared_value = self._convert_datetime(prepared_value)?                query_frag = filter_types[filter_type] % prepared_value?        if len(query_frag) and not isinstance(value, Raw):            if not query_frag.startswith(‘(‘) and not query_frag.endswith(‘)‘):                query_frag = "(%s)" % query_frag?        return u"%s%s" % (index_fieldname, query_frag)??class WhooshEngine(BaseEngine):    backend = WhooshSearchBackend    query = WhooshSearchQuery?

Linux入门Linux服务器搭建工作需要掌握的核心点
虚拟机的使用Linux安装（注意事项）服务器搭建（重点）网络配置（本地虚拟机）SSH连接远程服务器（putty、xshell6）FTP文件传输（FlashFXP、winscp）安装python（Linux自带python2.7.5）虚拟环境管理（virtualenv）django安装web服务器（Nginx + uwsgi） django项目发布数据库mysqlDNS解析（域名）Nginx多项目配置虚拟机安装虚拟机安装[重要]：https://blog.csdn.net/qq_39038465/article/details/81478847Linux目录结构bin:存放二进制可执行文件boot: 存放用于系统引导时使用的各种文件dev: 用于存放设备文件etc: 存放系统配置文件home: 存放所有用户文件的根目录lib 存放跟文件系统中的程序运行所需要的共享库及内核模块mnt 系统管理员安装临时文件系统的安装点opt 额外安装的可选应用程序包所放置的位置proc 虚拟文件系统，存放当前内存的映射root 超级用户目录sbin 存放二进制可执行文件，只有root才可以访问tmp 用于存放临时文件usr 用于存放系统应用程序var 用于存放运行时需要改变数据的文件Linux命令IP地址和主机名相关的命令查看IP：ifconfig重启网卡：service network restart查看网卡状态：service network status修改IP地址：vim /etc/sysconfig/network-scripts/ifcfg-ens33YPE="Ether net" # 网络类型，以太网BOOTPROTO="static"# 改为静态IPIPADDR="192.168.8.88"# IP地址NETMASK="255.255.255.0"# 子网掩码GATEWAY="192.168.8.1"# 网关DNS1="192.168.8.1"# 首选DNSONBOOT="yes"# 是否可以上网（默认为ON）1234567查看主机名：hostname修改主机名：vim /etc/hostnameCentOS7更换清华yum镜像清华：https://mirror.tuna.tsinghua.edu.cn/help/centos/
VIM常用命令 dd 删除光标所在的那一行u 撤销上一步操作ndd 删除光标所在位置起的多行 n为数字yy 复制光标当前所在的那一行nyy 复制多行 n为数字p 将已复制的内容粘贴到光标所在的位置的下一行大P 将已复制的内容粘贴到光标所在的位置的上一行np 粘贴多行到光标的下一行 n为数字ctrl+r 重复上一次操作$ 跳到一行的尾部0 跳到一行的头部gg 移动到这个文件的第一行G 跳到这个文件的最后一行nG 跳到n行set nu 显示行号H 光标移动到屏幕的最上方那一行的第一个字符M 光标移动到屏幕的中央那一行的第一个字符L 光标移动到屏幕的最下面那一行的第一个字符123456789101112131415161718CentOS7下安装python3# 1、安装pyhton3.7 的依赖包
yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel
# 2、下载python3.7的“源码”： wget https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz
# 3、解压并编译安装： tar -xJvf Python-3.7.0.tar.xz   # 4、用cd命令进入解压出来的Python文件夹 cd Python-3.7.0
# 5、用./方法执行configure,并指定安装到usr目录下 ./configure --prefix=/usr/local/python3 --enable-shared
# 6、开始编译安装 make && make install
# 7、配置环境变量，创建软链接 ln -s /usr/local/python3/bin/python3 /usr/bin/python3 # 创建python3的软链接     ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3 # 创建pip的软链接# 8、将编译目录下的libpython3.7m.so.1.0文件复制到cp /root/Python-3.7.0/libpython3.7m.so.1.0 /usr/lib64/libpython3.7m.so.1.012345678910111213141516171819202122232425262728CentOS7下安装MySQL# 1、下载mysql的repo源 wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
# 2、安装mysql-community-release-el7-5.noarch.rpm包 rpm -ivh mysql-community-release-el7-5.noarch.rpm
# 3、安装mysql yum install mysql-server
# 4、授权用户可以使用mysql chown -R root:root /var/lib/mysql
# 5、重启服务 service mysqld restart
# 6、接下来登录重置密码：     mysql -u root # 进入mysql     # 下面为mysql命令     use mysql;     update user set password=password(‘root‘) where user=‘root‘;     grant all privileges on *.* to ‘root‘@‘%‘ identified by ‘root‘; #设置远程登陆密码 flush privileges; #刷新当前配置注：如果不管用，重启虚拟机ctrl+c,退出myql# 7、开放3306端口： # 设置 iptables serviceyum -y install iptables-services
# 如果要修改防火墙配置，如增加防火墙端口3306vi /etc/sysconfig/iptables
# 增加规则-A INPUT -p tcp -m state --state NEW -m tcp --dport 3306 -j ACCEPT #保存退出后
# 8、配置防火墙: systemctl restart iptables.service # 重启防火墙使配置生效 systemctl enable iptables.service # 设置防火墙开机启动12345678910111213141516171819202122232425262728293031323334353637连接MySQL数据库，并新建一个库以备的django使用使用navicat连接数据库

打开navicat以后，点击右上角的连接，弹出以下窗口，按下图中的内容填写

新建数据库，以便的django连接

数据库已经建好，下面就是django配置数据库django配置MySQL数据库# 1、在settings文件中，把数据库配置地方更改为以下内容：DATABASES = {‘default‘: { ‘ENGINE‘: ‘django.db.backends.mysql‘, # 默认数据库为MySQL ‘NAME‘: ‘library‘, # 数据库名为library ‘USER‘: ‘root‘, # 连接数据库的用户 "root" ‘PASSWORD‘: ‘123456‘, # 用户密码 "123456" ‘HOST‘: ‘www.XXXXXX.cn‘, # 主机的IP或者域名都可以 ‘PORT‘: 3306, # 数据库端口，默认为3306}}1234567891011django项目目录下的__init__.py文件中，导入pymysqlimport pymysqlpymysql.install_as_MySQLdb()12CentOS7安装虚拟环境# 安装虚拟环境pip3 install virtualenv
# 创建软链接ln -s /usr/local/python3/bin/virtualenv /usr/bin/virtualenv12345创建目录# 创建报错虚拟环境目录名字是任意的mkdir -p /data/env # 个人网站发布文件夹 .名字都是任意的!mkdir -p /data/wwwroot1234创建、进入虚拟环境# 进入env目录cd /data/env# 创建虚拟环境virtualenv --python=/usr/bin/python3 py3_django2# 激活虚拟环境cd /data/env/py3_django2/binsource activate # 退出: deactivate# 安装django、uwsgi等.pip install djangopip install uwsgi # django项目发布相关# 退出虚拟环境cd /data/env/py3_django2/bindeactivate 12345678910111213为uwsgi创建软链接# 给uwsgi建立软链接，方便使用ln -s /usr/local/python3/bin/uwsgi /usr/bin/uwsgi12创建xml文件，保存名字与项目名同名，后缀为.xml<?xml version="1.0" encoding="UTF-8"?><uwsgi>     <socket>127.0.0.1:8000</socket>    <chdir>/data/wwwroot/library/</chdir>      <module>library.wsgi</module>    <processes>4</processes>     <daemonize>uwsgi.log</daemonize> </uwsgi>12345678本地项目上传到CentOS7服务器上# 1、在windows系统下，用cmd进入项目的目录，生成项目包依赖列表（如果依赖包少的话，这一步忽略）pip freeze > requirements.txt# 2、settings文件设置ALLOWED_HOSTS = [‘*‘] # 允许所有IP访问1234CentOS7下安装相关依赖包# CentOS7下安装相关依赖包（如果依赖包少的话，这一步忽略）pip install -r requirements.txt12迁移静态文件# 一、指定收集静态文件的目录，修改settings文件中静态文件路径 STATIC_ROOT = ‘/data/wwwroot/library/static‘
# 二、收集所有静态文件到STATIC_ROOT指定的目录python3 manage.py collectstatic12345安装Nginx# 1、用wget下载Nginxwget http://nginx.org/download/nginx-1.13.7.tar.gz   # 2、下载完成后，解压tar -zxvf nginx-1.13.7.tar.gz
# 3、进入到nginx-1.13.7目录下，并执行以下命令./configuremake && make install
# 4、nginx一般默认安装好的路径为/usr/local/nginx 在/usr/local/nginx/conf/中先备份一下nginx.conf文件，以防意外。cd /usr/local/nginx/conf/cp nginx.conf nginx.conf.bak
# 5、然后打开nginx.conf，把原来的内容删除，直接加入以下内容：events { worker_connections 1024;}http { include mime.types; default_type application/octet-stream; sendfile on; server { listen 80; server_name www.xxxxxx.cn; #改为自己的域名，没域名修改为127.0.0.1:80 charset utf-8; location / { include uwsgi_params; uwsgi_pass 127.0.0.1:8000; #端口要和uwsgi里配置的一样 uwsgi_param UWSGI_SCRIPT library.wsgi; #wsgi.py所在的目录名+.wsgi uwsgi_param UWSGI_CHDIR /data/wwwroot/library/; #项目路径    } location /static/ { alias /data/wwwroot/library/static/; #静态资源路径 } }}
""" 6、要留意备注的地方，要和UWSGI配置文件mysite.xml，还有项目路径对应上。进入/usr/local/nginx/sbin/目录执行./nginx -t命令先检查配置文件是否有错，没有错就执行以下命令："""cd /usr/local/nginx/sbin/./nginx
# 没有提示，证明成功# 测试127.0.0.1：80
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152启动项目# 进入djnago项目cd /data/wwwroot/library/
# uwsgi 解析项目中的配置文件uwsgi -x library.xml
#以上步骤都没有出错的话。cd /usr/local/nginx/sbin/
# 重启nginx./nginx -s reload
# 服务器内部测试是否发布成功curl 1270.0.0.0:80 #就可以看到网站!   # 关闭防火墙,否则远程不能访问!systemctl stop firewalld.service1234567891011121314151617Linux下防火墙管理# CentOS 7.0默认使用的是firewall作为防火墙，使用iptables必须重新设置一下
# 1、直接关闭防火墙systemctl stop firewalld.service # 停止firewallsystemctl disable firewalld.service # 禁止firewall开机启动
# 2、设置 iptables service （防火墙）yum -y install iptables-services # 安装防火墙管理# 如果要修改防火墙配置，如增加防火墙端口3306vi /etc/sysconfig/iptables # 用vi编辑器修改防火墙配置# 增加规则-A INPUT -p tcp -m state --state NEW -m tcp --dport 3306 -j ACCEPT# 保存退出后esc :wq   systemctl restart iptables.service # 重启防火墙使配置生效systemctl enable iptables.service # 设置防火墙开机启动
123456789101112131415161718常用指令# 查看python版本python -V
# 查看python命令如何解析which python3 # 找到命令位置 /usr/bin/python
# cd 命令进入到 /usr/binls -al python3*
# 查看虚拟环境目录和项目发布目录--------------------- 作者：董海明来源：CSDN 原文：https://blog.csdn.net/haiming0415/article/details/89946928 版权声明：本文为博主原创文章，转载请附上博文链接！

原文地址：https://www.cnblogs.com/guofeng-1016/p/11290685.html

时间： 2024-10-08 20:31:05

全文检索的配置

全文检索的配置的相关文章

sphinx全文检索安装配置和使用

sphinx全文检索功能 | windows下测试

sphinx + scws + Mysql + PHP全文检索

Sphinx在windows下安装使用(支持中文全文检索)

mysql 函数编程大全（持续更新）

Mysql全文搜索match against的用法

编程实践积累

sqlserver如何添加全文索引

C Sharp进行网站信息抽取与小型内部搜索引擎的讲解