All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"]. 思路:1.用map来存储字符序列。2.检查序列是否已经存在在map中。如果存在且count=1,就将序列添加到结果中 注意:map<string,int>会造成memory limits exceed, 解决方案1:将A,C,G,T替换成数字,但 map<int,int> 会造成int溢出,所以用map<long long,int> 解决方案2:bit manipulation【待做】
class Solution { public: vector<string> findRepeatedDnaSequences(string s) { //check validation vector<string> res; if(s.empty()) return res; //check special case int n=s.length(); if(n<10) return res; //general case string sbit; for(int i=0;i<n;i++){ if(s[i]==‘A‘) sbit+="0"; else if(s[i]==‘C‘) sbit+="1"; else if(s[i]==‘G‘) sbit+="2"; else sbit+="3"; } unordered_map<long long,int> map; string subbit; string subs; int subi; for(int i=0;i<n-9;i++){ subbit = sbit.substr(i,10); subi = stoll(subbit); subs = s.substr(i,10); if(map.count(subi) && map[subi]==1){ res.push_back(subs); } map[subi]++; } return res; } };
时间: 2024-10-11 20:41:59