原题地址:
https://oj.leetcode.com/problems/repeated-dna-sequences/
题目内容:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
方法:
大概的方向是,遍历所有长度为10的子串,用一个hash表记录每个不同子串的出现次数,最后输出满足条件的子串。
关键问题是:如何更快
我的办法:将A、C、G、T映射到1、2、3、4,然后换算成大整数。为了方便计算,字符串的最左边是最低位。这么说有些语焉不详,举几个例子:
AACG = 4311
CGTA = 1432
然后计算出首个字符串的整数值并加入map,这样,每下一个子串都可以通过 整数值/10 + 下一个字符乘以十亿来得到。
这样,hash值的计算从字符串变成了整数,同时,获得下一个字符串的行为也可以在更快的常数次时间内完成,因为操作字符串的时间开支。
全部代码:
class Solution { public: vector<string> findRepeatedDnaSequences(string s) { unordered_map<long long,int> dict; unordered_map<long long,int> :: iterator it; vector<string> res; long long flag = 1000000000; if (s.size() <= 10) return res; long long num = generateFirstNum(s); dict[num] = 1; for (int i = 10; i < s.size(); i ++) { num /= 10; long long now = getCharNum(s[i]); num += now * flag; it = dict.find(num); if (it == dict.end()) { dict[num] = 1; } else { dict[num] += 1; } } for (it = dict.begin(); it != dict.end(); it ++) { if (it->second > 1) { generateRes(res,it->first); } } return res; } long long generateFirstNum(string s) { long long res = 0; long long power = 1; for (int i = 0; i < 10; i ++) { long long num = getCharNum(s[i]); res += num * power; power *= 10; } return res; } long long getCharNum(char s) { switch (s) { case ‘A‘ : return 1; case ‘C‘ : return 2; case ‘G‘ : return 3; case ‘T‘ : return 4; } } char getNumChar(long long s) { switch (s) { case 1 : return ‘A‘; case 2 : return ‘C‘; case 3 : return ‘G‘; case 4 : return ‘T‘; } } void generateRes(vector<string> &res,long long target) { string s; while (target > 0) { char now = getNumChar(target % 10); s = s + now; target /= 10; } res.push_back(s); } };
时间: 2024-10-03 23:00:36