POJ 2945 Find the Clones (Trie树)

Find the Clones

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 7140   Accepted: 2655

Description

Doubleville, a small town in Texas, was attacked by the aliens. They have abducted some of the residents and taken them to the a spaceship orbiting around earth. After some (quite unpleasant) human
experiments, the aliens cloned the victims, and released multiple copies of them back in Doubleville. So now it might happen that there are 6 identical person named Hugh F. Bumblebee: the original person and its 5 copies. The Federal Bureau of Unauthorized
Cloning (FBUC) charged you with the task of determining how many copies were made from each person. To help you in your task, FBUC have collected a DNA sample from each person. All copies of the same person have the same DNA sequence, and different people
have different sequences (we know that there are no identical twins in the town, this is not an issue).

Input

The input contains several blocks of test cases. Each case begins with a line containing two integers: the number 1 ≤ n ≤ 20000 people, and the length 1 ≤ m ≤ 20 of the DNA sequences. The next n lines
contain the DNA sequences: each line contains a sequence of m characters, where each character is either `A‘, `C‘, `G‘ or `T‘.

The input is terminated by a block with n = m = 0 .

Output

For each test case, you have to output n lines, each line containing a single integer. The first line contains the number of different people that were not copied. The second line contains the number
of people that were copied only once (i.e., there are two identical copies for each such person.) The third line contains the number of people that are present in three identical copies, and so on: the i -th line contains the number of persons that are present
in i identical copies. For example, if there are 11 samples, one of them is from John Smith, and all the others are from copies of Joe Foobar, then you have to print `1‘ in the first andthe tenth lines, and `0‘ in all the other lines.

Sample Input

9 6
AAAAAA
ACACAC
GTTTTG
ACACAC
GTTTTG
ACACAC
ACACAC
TCCCCC
TCCCCC
0 0

Sample Output

1
2
0
1
0
0
0
0
0

Hint

Huge input file, ‘scanf‘ recommended to avoid TLE.

Source

Central Europe 2005

题目链接:http://poj.org/problem?id=2945

题目大意:n个基因片段,每个长度为m,输出n行表示重复出现i次(1 <= i <= n)的基因片段的个数

题目分析:排序可做,这里用静态字典树实现,详细见程序注释

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
int const MAX = 20005; //n最大
int const LEN = 25;    //m最大

int change(char ch)   //将ACGT转为0123
{
    if(ch == 'A')
        return 0;
    if(ch == 'C')
        return 1;
    if(ch == 'G')
        return 2;
    return 3;
}

struct node
{
    node* child[4]; //孩子结点,4叉字典树
    int cnt;  //同一个单词出现的次数
    bool end; //判断是否为叶子结点及是否为某个单词的最后一个字母
}Tree[MAX * LEN];

int cnt = 0;  //除去根结点的结点总数
int ans[MAX]; //ans[i] = j 表示重复了i次的不同基因有j个
char s[MAX][LEN];

inline void Init(node *p) //初始化根或子树
{
    memset(p -> child, NULL, sizeof(p -> child));
    p -> end = false;
    p -> cnt = 0;
}

void Insert(node *p, char *s)
{
    for(int i = 0; s[i] != '\0'; i++)
    {
        int idx = change(s[i]); //将字母转变为下标序号
        if(p -> child[idx] == NULL) //若孩子为空,即改前缀未出现,则插入字典树
        {
            cnt++;   //多一个结点,计数器加1
            p -> child[idx] = Tree + cnt; //插入该结点
            Init(p -> child[idx]);  //初始化以该结点为根的子树
        }
        p = p -> child[idx]; //转向下一结点
    }
    if(p -> end) //表示该单词出现过
    {
        p -> cnt++;
        return;
    }
    p -> end = true; //记录一个完整的单词
    p -> cnt = 1;  //该单词出现了一次
}

int main()
{
    int n, m;
    while(scanf("%d %d", &n, &m) != EOF && (n + m))
    {
        node *root = Tree;
        Init(root);
        cnt = 0;
        memset(ans, 0, sizeof(ans));
        for(int i = 0; i < n; i++)
        {
            scanf("%s", s[i]);
            Insert(root, s[i]);
        }
        for(int i = 1; i <= cnt; i++)
            if(Tree[i].end) //若该单词(序列)出现过
                ans[Tree[i].cnt]++; //记录重复了cnt次的单词的个数
        for(int i = 1; i <= n; i++)
            printf("%d\n", ans[i]);
    }
}
时间: 2024-09-29 10:46:46

POJ 2945 Find the Clones (Trie树)的相关文章

poj 2945 Find the Clones trie树的简单应用

题意: 给n个长m的字符串,统计他们的出现频率,输出出现1次的有几种,出现2次的有几种...出现n次的有几种.n<=20000,m<=20. 分析: 也可以用排序,map水的,但还是写个trie树也不麻烦,trie树我觉得就是针对字符串的hash表,效率如果数据大点是比暴力解法高很多的,另外写的时候不小心把index定义成char,n<256完全没问题..调了一个小时也是醉了. 代码: //poj 2945 //sep9 #include <iostream> using n

poj 2503 Babelfish (map,trie 树)

链接:poj 2503 题意:输入 语言A及翻译为语言B的词典,之后再输入语言B的单词,判断是否能从词典中找到, 若能找到,将其翻译为语言A,否则输出"eh". 思路:这题肯定得先将词典对应语言存起来,但是如果直接暴力找输入的单词是否出现过,必然会TLE 因为单词都是一对一的关系,可以用map实现 当然,trie树是用空间换时间,对于字符串的查找,在时间上有着相当的优势,因此也可以用trie树 注:sscanf函数,从一个字符串中读进与指定格式相符的数据. map实现:938MS #i

POJ 3630 Phone List(trie树的简单应用)

题目链接:http://poj.org/problem?id=3630 题意:给你多个字符串,如果其中任意两个字符串满足一个是另一个的前缀,那么输出NO,否则输出YES 思路:简单的trie树应用,插入的过程中维护到当前节点是不是字符串这个布尔量即可,同时判断是否存在上述情况. code: 1 #include <iostream> 2 #include <cstdio> 3 #include <string> 4 #include <cstring> 5

poj 2418 Hardwood Species (trie 树)

链接:poj 2418 题意:给定一些树的种类名,求每种树所占的百分比,并按字典序输出 分析:实质就是统计每种树的数量n,和所有树的数量m, 百分比就为 n*100./m 由于数据达到一百万,直接用数组查找肯定超时, 可以用trie树,空间换取时间 注:这题树的品种名除了包括大写字母,小写字母和空格外,还有其他字符, 所以要注意trie树的子结点的个数 #include<cstdio> #include<cstdlib> #include<cstring> #inclu

POJ2945 Find the Clones trie树

建一颗$trie$树(当然你哈希也资瓷),边插边更新,看看搜到最底时有多少个字符串,然后更新. #include<cstdio> #include<iostream> #include<algorithm> #include<cstring> #include<cmath> #include<cctype> #include<cstdlib> #include<vector> #include<queue

poj 2945 Find the Clones (map+string,hash思维)

Find the Clones Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 7498   Accepted: 2780 Description Doubleville, a small town in Texas, was attacked by the aliens. They have abducted some of the residents and taken them to the a spaceship

POJ 2945 Find the Clones Hash

题目大意:给出一些字符串,问其中n个一样的有多少. 思路:看discuss里各种神奇的方法啊,什么map啊,什么Trie啊.这题不是一眼Hash么..难道是我想错了? 任意hash方法将所有字符串hash然后排序,之后统计一下相同的有多少就行了,500+MS水过.. PS:明天就是NOIP我这么水真的好( CODE: #include <cstdio> #include <cstring> #include <iostream> #include <algorit

POJ 2945 Find the Clones 水

Find the Clones Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 7524   Accepted: 2789 Description Doubleville, a small town in Texas, was attacked by the aliens. They have abducted some of the residents and taken them to the a spaceship

poj 2418 Hardwood Species (trie树)

poj   2418   Hardwood Species http://poj.org/problem?id=2418 trie树+dfs 题意: 给你多个单词,问每个单词出现的频率. 方法:通过字典树,将所有单词放入树中,通过dfs遍历(题目要求按ASSIC码顺序输出单词及其频率),dfs可满足 注意:单词中不一定只出现26个英文字母,ASSIC码表共有256个字符 1 #include <stdio.h> 2 #include <string.h> 3 #include &l