Ternary Tree

  前一篇文章介绍了Trie树。它实现简单但空间效率低。假设要支持26个英文字母,每一个节点就要保存26个指针,因为节点数组中保存的空指针占用了太多内存。让我来看看Ternary Tree。

  When you have to store a set of strings, what data structure should you use?

You could use hash tables, which sprinkle the strings throughout an array. Access is fast, but information about relative order is lost. Another option is the use of binary search trees, which store strings in order, and are fairly fast. Or you could use digital search tries, which are lightning fast, but use lots of space.

  In this article, we’ll examine ternary search trees, which combine the time efficiency of digital tries with the space efficiency of binary search trees. The resulting structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. Ternary searches are faster than hashing and more powerful, too.

  三叉搜索树Ternary Tree,结合了字典树的时间效率和二叉搜索树的空间效率长处。

为了避免多余的指针占用内存,每一个Trie节点不再用数组来表示,而是表示成“树中有树”。Trie节点里每一个非空指针都会在三叉搜索树里得到属于它自己的节点。

  Each node has 3 children: smaller (left), equal (middle), larger (right).

  Follow links corresponding to each character in the key.

   ?If less, take left link; if greater, take right link.

   ?If equal, take the middle link and move to the next key character.

  Search hit. Node where search ends has a non-null value.

  Search miss. Reach a null link or node where search ends has null value.

// C program to demonstrate Ternary Search Tree (TST) insert, travese
// and search operations
#include <stdio.h>
#include <stdlib.h>
#define MAX 50

// A node of ternary search tree
struct Node
{
    char data;

    // True if this character is last character of one of the words
    unsigned isEndOfString: 1;

    struct Node *left, *eq, *right;
};

// A utility function to create a new ternary search tree node
struct Node* newNode(char data)
{
    struct Node* temp = (struct Node*) malloc(sizeof( struct Node ));
    temp->data = data;
    temp->isEndOfString = 0;
    temp->left = temp->eq = temp->right = NULL;
    return temp;
}

// Function to insert a new word in a Ternary Search Tree
void insert(struct Node** root, char *word)
{
    // Base Case: Tree is empty
    if (!(*root))
        *root = newNode(*word);

    // If current character of word is smaller than root‘s character,
    // then insert this word in left subtree of root
    if ((*word) < (*root)->data)
        insert(&( (*root)->left ), word);

    // If current character of word is greate than root‘s character,
    // then insert this word in right subtree of root
    else if ((*word) > (*root)->data)
        insert(&( (*root)->right ), word);

    // If current character of word is same as root‘s character,
    else
    {
        if (*(word+1))
            insert(&( (*root)->eq ), word+1);

        // the last character of the word
        else
            (*root)->isEndOfString = 1;
    }
}

// A recursive function to traverse Ternary Search Tree
void traverseTSTUtil(struct Node* root, char* buffer, int depth)
{
    if (root)
    {
        // First traverse the left subtree
        traverseTSTUtil(root->left, buffer, depth);

        // Store the character of this node
        buffer[depth] = root->data;
        if (root->isEndOfString)
        {
            buffer[depth+1] = ‘\0‘;
            printf( "%s\n", buffer);
        }

        // Traverse the subtree using equal pointer (middle subtree)
        traverseTSTUtil(root->eq, buffer, depth + 1);

        // Finally Traverse the right subtree
        traverseTSTUtil(root->right, buffer, depth);
    }
}

// The main function to traverse a Ternary Search Tree.
// It mainly uses traverseTSTUtil()
void traverseTST(struct Node* root)
{
    char buffer[MAX];
    traverseTSTUtil(root, buffer, 0);
}

// Function to search a given word in TST
int searchTST(struct Node *root, char *word)
{
    if (!root)
        return 0;

    if (*word < (root)->data)
        return searchTST(root->left, word);

    else if (*word > (root)->data)
        return searchTST(root->right, word);

    else
    {
        if (*(word+1) == ‘\0‘)
            return root->isEndOfString;

        return searchTST(root->eq, word+1);
    }
}

// Driver program to test above functions
int main()
{
    struct Node *root = NULL;

    insert(&root, "cat");
    insert(&root, "cats");
    insert(&root, "up");
    insert(&root, "bug");

    printf("Following is traversal of ternary search tree\n");
    traverseTST(root);

    printf("\nFollowing are search results for cats, bu and cat respectively\n");
    searchTST(root, "cats")?

printf("Found\n"): printf("Not Found\n");
    searchTST(root, "bu")?   printf("Found\n"): printf("Not Found\n");
    searchTST(root, "cat")?  printf("Found\n"): printf("Not Found\n");

    return 0;
}

Output:

Following is traversal of ternary search tree

bug

cat

cats

up

Following are search results for cats, bu and cat respectively

Found

Not Found

Found

Time Complexity: The time complexity of the ternary search tree operations is similar to that of binary search tree. i.e. the insertion, deletion and search operations take time proportional to the height of the ternary search tree. The space is proportional to the length of the string to be stored.

Hashing.

?Need to examine entire key.

?Search hits and misses cost about the same.

?Performance relies on hash function.

?Does not support ordered symbol table operations.

TSTs.

?Works only for string (or digital) keys.

?Only examines just enough key characters.

?Search miss may involve only a few characters.

?Supports ordered symbol table operations (plus extras!). Red-black BST.

?Performance guarantee: log N key compares.

?Supports ordered symbol table API.

Hash tables.

?Performance guarantee: constant number of probes.

?Requires good hash function for key type.

Tries. R-way, TST.

?Performance guarantee: log N characters accessed.

?Supports character-based operations.

时间: 2024-11-05 00:34:45

Ternary Tree的相关文章

数据结构《17》---- Ternary Search Tree

一. 序言 上一篇文章中,给出了 trie 树的一个实现.可以看到,trie 树有一个巨大的弊病,内存占用过大. 本文给出另一种数据结构来解决上述问题---- Ternary Search Tree (三叉树) 二.数据结构定义 Trie 树中每个节点包含了 26 个指针,但有很大一部分的指针是 NULL 指针,因此浪费了大量的资源. 一种改进措施就是,以一棵树来代替上述的指针数组. 节点定义如下: 一个节点代表了一个字母,左孩子的字母小于当前节点,右孩子的字母大于当前节点. 同时每个节点包含一

Trie和Ternary Search Tree介绍

Trie树 Trie树,又称字典树,单词查找树或者前缀树,是一种用于快速检索的多叉树结构,如英文字母的字典树是一个26叉树,数字的字典树是一个10叉树. Trie树与二叉搜索树不同,键不是直接保存在节点中,而是由节点在树中的位置决定.一个节点的所有子孙都有相同的前缀(prefix),也就是这个节点对应的字符串,而根节点对应空字符串.一般情况下,不是所有的节点都有对应的值,只有叶子节点和部分内部节点所对应的键才有相关的值. Trie树可以利用字符串的公共前缀来节约存储空间,如下图所示,该Trie树

IK分词器原理与源码分析

原文:http://3dobe.com/archives/44/ 引言 做搜索技术的不可能不接触分词器.个人认为为什么搜索引擎无法被数据库所替代的原因主要有两点,一个是在数据量比较大的时候,搜索引擎的查询速度快,第二点在于,搜索引擎能做到比数据库更理解用户.第一点好理解,每当数据库的单个表大了,就是一件头疼的事,还有在较大数据量级的情况下,你让数据库去做模糊查询,那也是一件比较吃力的事(当然前缀匹配会好得多),设计上就应当避免.关于第二点,搜索引擎如何理解用户,肯定不是简单的靠匹配,这里面可以加

Ternary Search Tree C++实现

问题描述: 1.Ternary Search Tree较之于Trie Tree也是一种前缀树(prefix tree),主要用于存储字符串,再对大量字符串进行查询和存储(insert)操作时有非常好的性能: 2.Ternary Search Tree vs Trie Tree有更好的空间效率:所占内存更少,对于存储相同的字符串集: 3.Ternary Search Tree每个节点有三个指针,分别指向小于,等于,大于此节点值(字符串中的一个字符)的各个孩子节点: 4.Ternary Search

Trie tree 和 Ternary search 比较

Trie tree (字典树) 优点: 高效 缺点: 耗内存 Ternary search (结合Trie tree 和 二叉搜索树的各自优点,节省了内存,降低了效率) 简介: 三叉搜索树,左右两叉用于指引key大小的走向,中间叉表示与当前字符相等 优点: 节省内存 缺点: 没有Trie tree 高效,且插入顺序严重影响效率

Ternary Search Tree 应用--搜索框智能提示

前面介绍了Ternary Search Tree和它的实现,那么可以用Ternary Search Tree来实现搜索框的只能提示,因为Ternary Search Tree的前缀匹配效率是非常高的,总体思路如下(其中很多可以根据自己的需要修改,我只是写出我的做法): 比如搜索歌曲时智能提示: 建立Ternary Search Tree 将所有歌曲名的字符串放置在一个map中,key为歌曲名.value存储歌曲信息,可以是一个类对象domain,在这里可以按照key值将相同歌曲的播放次数累加,并

Ternary Search Tree Java实现

/** * @author Edwin Chen * */ //定义节点 class Node { //存储字符串 char storeChar; //是否完成单词 boolean isComplete; Node leftChild,centerChild,rightChild; //构造方法 public Node(char storeChar,boolean isComplete) { this.storeChar = storeChar; this.isComplete = isComp

Ternary Search Trees 三分搜索树

经常碰到要存一堆的string, 这个时候可以用hash tables, 虽然hash tables 查找很快,但是hash tables不能表现出字符串之间的联系.可以用binary search tree, 但是查询速度不是很理想. 可以用trie, 不过trie会浪费很多空间(当然你也可以用二个数组实现也比较省空间). 所以这里Ternary Search trees 有trie的查询速度快的优点,以及binary search tree省空间的优点. 实现一个12个单词的查找 这个是用二

SPOJ 375. Query on a tree (树链剖分)

Query on a tree Time Limit: 5000ms Memory Limit: 262144KB This problem will be judged on SPOJ. Original ID: QTREE64-bit integer IO format: %lld      Java class name: Main Prev Submit Status Statistics Discuss Next Font Size: + - Type:   None Graph Th