Repeated DNA Sequences 【待解决】

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

思路:1.用map来存储字符序列。2.检查序列是否已经存在在map中。如果存在且count=1,就将序列添加到结果中

注意:map<string,int>会造成memory limits exceed,     解决方案1:将A,C,G,T替换成数字,但 map<int,int> 会造成int溢出,所以用map<long long,int>     解决方案2:bit manipulation【待做】


class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
       //check validation
       vector<string> res;
       if(s.empty()) return res;
       //check special case
       int n=s.length();
       if(n<10) return res;

       //general case
       string sbit;
       for(int i=0;i<n;i++){
           if(s[i]==‘A‘) sbit+="0";
           else if(s[i]==‘C‘) sbit+="1";
           else if(s[i]==‘G‘) sbit+="2";
           else  sbit+="3";
       }
       unordered_map<long long,int> map;
       string subbit;
       string subs;
       int subi;
       for(int i=0;i<n-9;i++){
           subbit = sbit.substr(i,10);
           subi = stoll(subbit);
           subs = s.substr(i,10);
           if(map.count(subi) && map[subi]==1){
               res.push_back(subs);
           }
           map[subi]++;
       }
       return res;
    }
};


时间: 2024-10-11 20:41:59

Repeated DNA Sequences 【待解决】的相关文章

[LeetCode]Repeated DNA Sequences

题目:Repeated DNA Sequences 给定包含A.C.G.T四个字符的字符串找出其中十个字符的重复子串. 思路: 首先,string中只有ACGT四个字符,因此可以将string看成是1,3,7,20这三个数字的组合串: 并且可以发现{ACGT}%5={1,3,2,0};于是可以用两个位就能表示上面的四个字符: 同时,一个子序列有10个字符,一共需要20bit,即int型数据类型就能表示一个子序列: 这样可以使用计数排序的思想来统计重复子序列: 这个思路时间复杂度只有O(n),但是

Repeated DNA Sequences

package cn.edu.xidian.sselab.hashtable; import java.util.ArrayList;import java.util.HashSet;import java.util.List;import java.util.Set; /** *  * @author zhiyong wang * title: Repeated DNA Sequences * content: *  All DNA is composed of a series of nuc

【LeetCode】187. Repeated DNA Sequences

Repeated DNA Sequences All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all

leetcode 204/187/205 Count Primes/Repeated DNA Sequences/Isomorphic Strings

一:leetcode 204 Count Primes 题目: Description: Count the number of prime numbers less than a non-negative number, n 分析:此题的算法源码可以参看这里,http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes 代码: class Solution { public: int countPrimes(int n) { // 求小于一个数n的素数个

【LeetCode】Repeated DNA Sequences 解题报告

[题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-lon

[LeetCode] 187. Repeated DNA Sequences 解题思路

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq

[leedcode 187] Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq

187. Repeated DNA Sequences

题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long

LeetCode() Repeated DNA Sequences 看的非常的过瘾!

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq