BST
以下BST的定义来自于Wikipedia:
Binary Search Tree, is a node-based binary tree data structure which has the following properties:
- The left subtree of a node contains only nodes with keys less than the node’s key.
- The right subtree of a node contains only nodes with keys greater than the node’s key.
- The left and right subtree each must also be a binary search tree.
There must be no duplicate nodes.
二叉搜索树keys有序的性质使得搜索、查找最小值、查找最大值可以迅速完成,如果没有这个ordering,我们就不得不用指定key与二叉树中的每一个key进行比较。
Searching a key
To search a given key in Bianry Search Tree, we first compare it with root, if the key is present at root, we return root. If key is greater than root’s key, we recur for right subtree of root node. Otherwise we recur for left subtree.
// C function to search a given key in a given BST struct node* search(struct node* root, int key) { // Base Cases: root is null or key is present at root if (root == NULL || root->key == key) return root; /** * 如果root节点的key不等于指定的key,那么必然在其子树中查找 * 如同以下代码,当root->key < key时,返回的节点由search(root->right, key)告诉我 */ // Key is greater than root‘s key if (root->key < key) return search(root->right, key); // 递归深入到右子树 // Key is smaller than root‘s key return search(root->left, key); }
Insertion of a key
A new key is always inserted at leaf. We start searching a key from root till we hit a leaf node. Once a leaf node is found, the new node is added as a child of the leaf node.
100 100 / \ Insert 40 / 20 500 ---------> 20 500 / \ / \ 10 30 10 30 \ 40
/* A utility function to insert a new node with given key in BST */ struct node* insert(struct node* node, int key) { /* If the tree is empty, return a new node */
// base case if (node == NULL) return newNode(key); /* Otherwise, recur down the tree */ if (key < node->key) node->left = insert(node->left, key); else if (key > node->key) node->right = insert(node->right, key); /* return the (unchanged) node pointer */ return node; }
Time Complexity: The worst case time complexity of search and insert operations is O(h) where h is height of Binary Search Tree. In worst case, we may have to travel from root to the deepest leaf node. The height of a skewed tree may become n and the time complexity of search and insert operation may become O(n).
Deletion of a node
当我们在BST中删除一个node的时候,可能有以下几种情况:
1) Node to be deleted is leaf: Simply remove from the tree.
50 50 / \ delete(20) / 30 70 ---------> 30 70 / \ / \ \ / \ 20 40 60 80 40 60 80
2) Node to be deleted has only one child: Copy the child to the node and delete the child
50 50 / \ delete(30) / 30 70 ---------> 40 70 \ / \ / \ 40 60 80 60 80
3) Node to be deleted has two children: Find inorder successor of the node. Copy contents of the inorder successor to the node and delete the inorder successor. Note that inorder predecessor can also be used.
50 60 / \ delete(50) / 40 70 ---------> 40 70 / \ \ 60 80 80
The important thing to note is, inorder successor is needed only when right child is not empty. In this particular case, inorder successor can be obtained by finding the minimum value in right child of the node.
/* Given a non-empty binary search tree, return the node with minimum key value found in that tree. Note that the entire tree does not need to be searched. */ struct node * minValueNode(struct node* node) { struct node* current = node; /* loop down to find the leftmost leaf */ while (current->left != NULL) current = current->left; return current; } /* Given a binary search tree and a key, this function deletes the key and returns the new root */ struct node* deleteNode(struct node* root, int key) { // base case 1 : Tree is NULL if (root == NULL) return root; // If the key to be deleted is smaller than the root‘s key, // then it lies in left subtree if (key < root->key) root->left = deleteNode(root->left, key); // If the key to be deleted is greater than the root‘s key, // then it lies in right subtree else if (key > root->key) root->right = deleteNode(root->right, key); // if key is same as root‘s key, then This is the node // to be deleted else { // base case 2
// node with only one child or no child if (root->left == NULL) { struct node *temp = root->right; free(root); return temp; } else if (root->right == NULL) { struct node *temp = root->left; free(root); return temp; } // node with two children: Get the inorder successor (smallest // in the right subtree) struct node* temp = minValueNode(root->right); // Copy the inorder successor‘s content to this node root->key = temp->key; // Delete the inorder successor // 该被删node,要么没有孩子,要么只有右孩子,因此可以以上面的方式删除。 // 此处必须这样递归删除,是因为我们没有办法拿到temp节点的父亲节点,无法将父亲节点的孩子节点置NULL root->right = deleteNode(root->right, temp->key); } // 1. 当被删除节点无孩子或只有一个孩子的时候,可以直接删除, // 并将其后继代替其位置即可,因此返回temp // 2. 当被删除节点有两个孩子的时候,我们只是对key作了替换,没有改变该node的拓卜结构,因此返回的仍然是该节点 // 3. 当前节点并非所要删除的node时,返回的自然仍是该node // 2和3可以合并成以下的 return root return root; /** * 10 * / \ * 9 15 * / / * 6 12 18 * / * 16 * * 假设所删节点是15,那么将15替换成16后,会执行删除原来值为16的node的操作 * 删除该node后,递归将往上逐层返回到原来值为15的node,在逐层网上返回。 * * 总结一点就是:找到那个被删除的点,递归即逐层返回。 * */ }
Time Complexity: The worst case time complexity of delete operation is O(h) where h is height of Binary Search Tree. In worst case, we may have to travel from root to the deepest leaf node. The height of a skewed tree may become n and the time complexity of delete operation may become O(n)
AVL Tree
AVL tree is a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees cannot be more than one for all nodes.
0 Why AVL Trees?
Most of the BST operations (e.g., search, max, min, insert, delete.. etc) take O(h) time where h is the height of the BST. The cost of these operations may become O(n) for a skewed Binary tree. If we make sure that height of the tree remains O(Logn) after every insertion and deletion, then we can guarantee an upper bound of O(Logn) for all these operations. The height of an AVL tree is always O(Logn) where n is the number of nodes in the tree.
1 Insertion
To make sure that the given tree remains AVL after every insertion, we must augment the standard BST insert operation to perform some re-balancing. Following are two basic operations that can be performed to re-balance a BST without violating the BST property (keys(left) < key(root) < keys(right)). 1) Left Rotation 2) Right Rotation
T1, T2 and T3 are subtrees of the tree rooted with y (on left side) or x (on right side) y x / \ Right Rotation / x T3 – – – – – – – > T1 y / \ < - - - - - - - / T1 T2 Left Rotation T2 T3 Keys in both of the above trees follow the following order keys(T1) < key(x) < keys(T2) < key(y) < keys(T3) So BST property is not violated anywhere.
Steps to follow for insertion
插入的方法和二叉查找树基本一样,区别是,插入完成后需要从插入的节点开始维护一个到根节点的路径,每经过一个节点都要维持树的平衡。维持树的平衡要根据高度差的特点选择不同的旋转算法。
值得注意的是,实际上只要找到第一个不平衡的节点将其平衡后,该树即为BBST了。以下的代码,由于递归的原因,在插入节点后,会逐层递归返回,返回的同时会检查该node是否平衡,这个过程会持续到root。考虑到判断是否平衡只需要常数的时间,所以此处不做优化(优化的方式是在递归函数中设置一个flag变量)。
Let the newly inserted node be w
1) Perform standard BST insert for w.
2) Starting from w, travel up and find the first unbalanced node. Let z be the first unbalanced node, y be the child of z that comes on the path from w to z and x be the grandchild of z that comes on the path from w to z.
3) Re-balance the tree by performing appropriate rotations on the subtree rooted with z. There can be 4 possible cases that needs to be handled as x, y and z can be arranged in 4 ways. Following are the possible 4 arrangements:
a) y is left child of z and x is left child of y (Left Left Case)
b) y is left child of z and x is right child of y (Left Right Case)
c) y is right child of z and x is right child of y (Right Right Case)
d) y is right child of z and x is left child of y (Right Left Case)
Following are the operations to be performed in above mentioned 4 cases. In all of the cases, we only need to re-balance the subtree rooted with z and the complete tree becomes balanced as the height of subtree (After appropriate rotations) rooted with z becomes same as it was before insertion.
a) Left Left Case
T1, T2, T3 and T4 are subtrees. z y / \ / y T4 Right Rotate (z) x z / \ - - - - - - - - -> / \ / \ x T3 T1 T2 T3 T4 / T1 T2
b) Left Right Case
z z x / \ / \ / \ y T4 Left Rotate (y) x T4 Right Rotate(z) y z / \ - - - - - - - - -> / \ - - - - - - - -> / \ / T1 x y T3 T1 T2 T3 T4 / \ / T2 T3 T1 T2
c) Right Right Case
z y / \ / \ T1 y Left Rotate(z) z x / \ - - - - - - - -> / \ / T2 x T1 T2 T3 T4 / T3 T4
d) Right Left Case
z z x / \ / \ / \ T1 y Right Rotate (y) T1 x Left Rotate(z) z y / \ - - - - - - - - -> / \ - - - - - - - -> / \ / x T4 T2 y T1 T2 T3 T4 / \ / T2 T3 T3 T4
C implementation
Following is the C implementation for AVL Tree Insertion. The following C implementation uses the recursive BST insert to insert a new node. In the recursive BST insert, after insertion, we get pointers to all ancestors one by one in bottom up manner. So we don’t need parent pointer to travel up. The recursive code itself travels up and visits all the ancestors of the newly inserted node.
1) Perform the normal BST insertion.
2) The current node must be one of the ancestors of the newly inserted node. Update the height of the current node.
3) Get the balance factor (left subtree height – right subtree height) of the current node.
4) If balance factor is greater than 1, then the current node is unbalanced and we are either in Left Left case or left Right case. To check whether it is left left case or not, compare the newly inserted key with the key in left subtree root.
5) If balance factor is less than -1, then the current node is unbalanced and we are either in Right Right case or Right Left case. To check whether it is Right Right case or not, compare the newly inserted key with the key in right subtree root.
#include <stdio.h> #include <stdlib.h> // An AVL tree node struct node { int key; struct node *left; struct node *right; int height; }; // A utility function to get maximum of two integers int max(int a, int b) { return (a > b) ? a : b; } // A utility function to get height of the tree int height(struct node *N) { if (N == NULL) { return 0; } return N->height; } /* Helper function that allocates a new node with the given key and NULL left and right pointers. */ struct node *newNode(int key) { struct node *node = (struct node *) malloc(sizeof(struct node)); node->key = key; node->left = NULL; node->right = NULL; node->height = 1; // new node is initially added at leaf return node; } /* * T1, T2 and T3 are subtrees of the tree rooted with y (on left side) * or x (on right side) * y x * / \ Right Rotation / * x T3 – – – – – – – > T1 y * / \ < - - - - - - - / * T1 T2 Left Rotation T2 T3 * Keys in both of the above trees follow the following order * keys(T1) < key(x) < keys(T2) < key(y) < keys(T3) * So BST property is not violated anywhere. * */ // A utility function to right rotate subtree rooted with y // See the diagram given above struct node *rightRotate(struct node *y) { struct node *x = y->left; struct node *T2 = x->right; // Perform rotation x->right = y; y->left = T2; // Update heights y->height = max(height(y->left), height(y->right)) + 1; x->height = max(height(x->left), height(x->right)) + 1; // Return new root return x; } /* * T1, T2 and T3 are subtrees of the tree rooted with y (on left side) * or x (on right side) * y x * / \ Right Rotation / * x T3 – – – – – – – > T1 y * / \ < - - - - - - - / * T1 T2 Left Rotation T2 T3 * Keys in both of the above trees follow the following order * keys(T1) < key(x) < keys(T2) < key(y) < keys(T3) * So BST property is not violated anywhere. * */ // A utility function to left rotate subtree rooted with x // See the diagram above struct node *leftRotate(struct node *x) { struct node *y = x->right; struct node *T2 = y->left; // Perform rotation y->left = x; x->right = T2; // Update heights x->height = max(height(x->left), height(x->right)) + 1; y->height = max(height(y->left), height(y->right)) + 1; // Return new root return y; } // Get Balance factor of node N int getBalance(struct node *N) { if (N == NULL) { return 0; } return height(N->left) - height(N->right); } struct node *insert(struct node *node, int key) { /* 1. Perform the normal BST rotation */ // 递归基,树空或者找到插入位置 if (node == NULL) { return newNode(key); } if (key < node->key) { node->left = insert(node->left, key); } else { node->right = insert(node->right, key); } // 递归返回前,只有可能走以上三个分支 /* 2. Update height of this ascestor node */ node->height = max(height(node->left), height(node->right)) + 1; /* 3. Get the balance factor of this ancestor node to check whether this node became unbalanced */ int balance = getBalance(node); // If this node becomes unbalanced, then there are 4 cases /* * a) Left Left Case * * T1, T2, T3 and T4 are subtrees. * * z y * / \ / * y T4 Right Rotate (z) x z * / \ - - - - - - - - -> / \ / \ * x T3 T1 T2 T3 T4 * / * T1 T2 */ if (balance > 1 && key < node->left->key) { return rightRotate(node); } /* * b) Right Right Case * * z y * / \ / \ * T1 y Left Rotate(z) z x * / \ - - - - - - - -> / \ / * T2 x T1 T2 T3 T4 * / * T3 T4 */ if (balance < -1 && key > node->right->key) { return leftRotate(node); } /* * c) Left Right Case * * z z x * / \ / \ / \ * y T4 Left Rotate (y) x T4 Right Rotate(z) y z * / \ - - - - - - - - -> / \ - - - - - - - -> / \ / * T1 x y T3 T1 T2 T3 T4 * / \ / * T2 T3 T1 T2 */ if (balance > 1 && key > node->left->key) { node->left = leftRotate(node->left); return rightRotate(node); } /* * d) Right Left Case * * z z x * / \ / \ / \ * T1 y Right Rotate (y) T1 x Left Rotate(z) z y * / \ - - - - - - - - -> / \ - - - - - - - -> / \ / * x T4 T2 y T1 T2 T3 T4 * / \ / * T2 T3 T3 T4 */ if (balance < -1 && key < node->right->key) { node->right = rightRotate(node->right); return leftRotate(node); } /* return the (unchanged) node pointer */ return node; } // A utility function to print preorder traversal of the tree. // The function also prints height of every node void preOrder(struct node *root) { if(root != NULL) { printf("%d ", root->key); preOrder(root->left); preOrder(root->right); } } /* Drier program to test above function*/ int main() { struct node *root = NULL; /* Constructing tree given in the above figure */ root = insert(root, 10); root = insert(root, 20); root = insert(root, 30); root = insert(root, 40); root = insert(root, 50); root = insert(root, 25); /* The constructed AVL Tree would be 30 / 20 40 / \ 10 25 50 */ printf("Pre order traversal of the constructed AVL tree is \n"); preOrder(root); return 0; } /** * Output: * Pre order traversal of the constructed AVL tree is * 30 20 10 25 40 50 */
Time Complexity: The rotation operations (left and right rotate) take constant time as only few pointers are being changed there. Updating the height and getting the balance factor also take constant time. So the time complexity of AVL insert remains same as BST insert which is O(h) where h is height of the tree. Since AVL tree is balanced, the height is O(Logn). So time complexity of AVL insert is O(Logn).
The AVL tree and other self balancing search trees like Red Black are useful to get all basic operations done in O(Logn) time. The AVL trees are more balanced compared to Red Black Trees, but they may cause more rotations during insertion and deletion. So if your application involves many frequent insertions and deletions, then Red Black trees should be preferred. And if the insertions and deletions are less frequent and search is more frequent operation, then AVL tree should be preferred over Red Black Tree.(AVL由于更为平衡,搜索的平均效率会好于红黑树)
2 Deletion
删除完成后,需要从删除节点的父亲开始向上维护树的平衡一直到根节点。
struct node* deleteNode(struct node* root, int key) { // STEP 1: PERFORM STANDARD BST DELETE
// 递归基1: Tree is NULL if (root == NULL) return root; // If the key to be deleted is smaller than the root‘s key, // then it lies in left subtree if ( key < root->key ) root->left = deleteNode(root->left, key); // If the key to be deleted is greater than the root‘s key, // then it lies in right subtree else if( key > root->key ) root->right = deleteNode(root->right, key); // if key is same as root‘s key, then This is the node // to be deleted else { // 递归基2: 找到被删除node,且该node是叶子节点或者只有一个孩子
// node with only one child or no child if( (root->left == NULL) || (root->right == NULL) ) { struct node *temp = root->left ? root->left : root->right; // No child case if(temp == NULL) { temp = root; root = NULL; } else // One child case *root = *temp; // Copy the contents of the non-empty child free(temp); } else { // node with two children: Get the inorder successor (smallest // in the right subtree) struct node* temp = minValueNode(root->right); // Copy the inorder successor‘s data to this node root->key = temp->key; // Delete the inorder successor root->right = deleteNode(root->right, temp->key); } } // If the tree had only one node then return if (root == NULL) return root; // STEP 2: UPDATE HEIGHT OF THE CURRENT NODE root->height = max(height(root->left), height(root->right)) + 1; // STEP 3: GET THE BALANCE FACTOR OF THIS NODE (to check whether // this node became unbalanced) int balance = getBalance(root); // If this node becomes unbalanced, then there are 4 cases // Left Left Case if (balance > 1 && getBalance(root->left) >= 0) return rightRotate(root); // Left Right Case if (balance > 1 && getBalance(root->left) < 0) { root->left = leftRotate(root->left); return rightRotate(root); } // Right Right Case if (balance < -1 && getBalance(root->right) <= 0) return leftRotate(root); // Right Left Case if (balance < -1 && getBalance(root->right) > 0) { root->right = rightRotate(root->right); return leftRotate(root); } return root; }
Time Complexity: The rotation operations (left and right rotate) take constant time as only few pointers are being changed there. Updating the height and getting the balance factor also take constant time. So the time complexity of AVL delete remains same as BST delete which is O(h) where h is height of the tree. Since AVL tree is balanced, the height is O(Logn). So time complexity of AVL delete is O(Logn).
小结
1.
在BST的删除操作中,如果被删除node是叶子节点或者只有一个孩子,那么此时认为找到真正要删除的node了,递归不再深入下去,直接返回;如果被删除的node(记为node1)左右两个孩子都在,那么我们需要找到按key排序后的下一个node(记为node2),将node1的key替换为node2的key后,需要删除node2,node2是我们真正需要删除的node,找到该node后,递归不再深入,直接返回。由于node2要么是叶子节点,要么只有右孩子,可以归入上面的情况。因此,此处真正的递归基在“找到被删除的节点,且该节点是叶子节点或者只有一个孩子”。(当然找不到需要删除的节点也是递归基)
2.
在普通BST的insert中,当递归逐层返回的时候,如果当前节点是新插入的节点,则返回该节点(实际上是递归基),否则返回原来的节点;
在delete中,当递归逐层返回的时候,如果当前节点确实被free了(递归基),那么返回代替该节点位置的节点,否则返回原来的节点。
3.
在AVL树中,insert操作后,递归逐层返回时(返回插入节点时是递归基,递归最多就深入到这),(指递归基的以上)都会更新当前节点高度,并判断是否平衡,如果不平衡,则会将其平衡后返回经平衡后的node,否则返回原node;
在delete中,找到真正要删除的节点是递归基(或者找不到待删除的点),之后当递归返回时,都会更新当前节点高度,并判断是否平衡,如果不平衡,则会将其平衡后返回经平衡后的node,否则返回原node;
4.
通过AVL,我们可以更加深刻的理解递归基。何为真正的递归基?让递归不再继续深入下去的条件才是真正的递归基。可以是显式的return,也可以是隐式的语句(例如一共只有三个分支可以走,另外两个分支会继续递归深入,而该分支不再递归深入,那么这个分支实际上就是递归基。return语句只是一个特例而已)。
5.
每一层递归返回的内容,node如果被修改,就返回修改后的node,如果没有被修改,就返回原来的node。