[Swift]LeetCode187. 重复的DNA序列 | Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

所有 DNA 由一系列缩写为 A，C，G 和 T 的核苷酸组成，例如：“ACGAATTCCG”。在研究 DNA 时，识别 DNA 中的重复序列有时会对研究非常有帮助。

编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列（子串）。

示例:

输入: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

输出: ["AAAAACCCCC", "CCCCCAAAAA"]

超出时间限制

 1 class Solution {
 2     func findRepeatedDnaSequences(_ s: String) -> [String] {
 3         if s.count < 9 {return []}
 4         var res:Set<String> = Set<String>()
 5         var st:Set<String> = Set<String>()
 6         for i in 0..<(s.count - 9)
 7         {
 8             var t:String = s.subString(i,10)
 9             if st.contains(t)
10             {
11                 res.insert(t)
12             }
13             else
14             {
15                 st.insert(t)
16             }
17         }
18         //Set转数组[String]
19         return Array(res)
20     }
21 }
22
23 extension String {
24     // 截取字符串：指定索引和字符数
25     // - begin: 开始截取处索引
26     // - count: 截取的字符数量
27     func subString(_ begin:Int,_ count:Int) -> String {
28         let start = self.index(self.startIndex, offsetBy: max(0, begin))
29         let end = self.index(self.startIndex, offsetBy:  min(self.count, begin + count))
30         return String(self[start..<end])
31     }
32
33 }

超出时间限制

 1 class Solution {
 2     func findRepeatedDnaSequences(_ s: String) -> [String] {
 3         if s.count < 9 {return []}
 4         var res:Set<String> = Set<String>()
 5         var st:Set<String> = Set<String>()
 6         var cur:Int = 0
 7         for i in 0..<9
 8         {
 9             cur = cur << 3 | (s[i].ascii & 7)
10         }
11
12         for i in 9..<s.count
13         {
14             cur = ((cur & 0x7ffffff) << 3) | (s[i].ascii & 7)
15             var t:String = s.subString(i - 9, 10)
16             if st.contains(t)
17             {
18                 res.insert(t)
19             }
20             else
21             {
22                 st.insert(t)
23             }
24         }
25
26         //Set转数组[String]
27         return Array(res)
28     }
29 }
30
31 extension String {
32     //subscript函数可以检索数组中的值
33     //直接按照索引方式截取指定索引的字符
34     subscript (_ i: Int) -> Character {
35         //读取字符
36         get {return self[index(startIndex, offsetBy: i)]}
37     }
38
39     // 截取字符串：指定索引和字符数
40     // - begin: 开始截取处索引
41     // - count: 截取的字符数量
42     func subString(_ begin:Int,_ count:Int) -> String {
43         let start = self.index(self.startIndex, offsetBy: max(0, begin))
44         let end = self.index(self.startIndex, offsetBy:  min(self.count, begin + count))
45         return String(self[start..<end])
46     }
47
48 }
49
50 //Character扩展方法
51 extension Character
52 {
53   //属性：ASCII整数值(定义小写为整数值)
54    var ascii: Int {
55         get {
56             let s = String(self).unicodeScalars
57             return Int(s[s.startIndex].value)
58         }
59     }
60 }

原文地址：https://www.cnblogs.com/strengthen/p/10176686.html

时间： 2024-10-10 14:28:43

[Swift]LeetCode187. 重复的DNA序列 | Repeated DNA Sequences的相关文章

[Swift]LeetCode459. 重复的子字符串 | Repeated Substring Pattern

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 100

[LeetCode]Repeated DNA Sequences

题目:Repeated DNA Sequences 给定包含A.C.G.T四个字符的字符串找出其中十个字符的重复子串. 思路: 首先,string中只有ACGT四个字符,因此可以将string看成是1,3,7,20这三个数字的组合串: 并且可以发现{ACGT}%5={1,3,2,0};于是可以用两个位就能表示上面的四个字符: 同时,一个子序列有10个字符,一共需要20bit,即int型数据类型就能表示一个子序列: 这样可以使用计数排序的思想来统计重复子序列: 这个思路时间复杂度只有O(n),但是

[LeetCode] 187. Repeated DNA Sequences 解题思路

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq

Repeated DNA Sequences

package cn.edu.xidian.sselab.hashtable; import java.util.ArrayList;import java.util.HashSet;import java.util.List;import java.util.Set; /** * * @author zhiyong wang * title: Repeated DNA Sequences * content: * All DNA is composed of a series of nuc

环状DNA序列

大意: 一个DNA序列是环状的,这意味着有N个碱基的序列有N种表示方法(假设无重复).而这N个序列有一种最小的表示,这个最小表示的意思是这个序列的字典序最小(字典序的意思是在字典中的大小比如ABC<ACB,B<BCD,EF<G) 方法:在一个序列中从任意两个位置开始,产生的序列的大小是可以比较的.然后利用这种比较方法找出最小值 #include <iostream> using namespace std; #define MAX 105 int lessthan(char

Repeated DNA Sequences @leetcode

早上到公司第一件事变成了刷leetcode,发现各种题目的各种解法,真心是个挺有趣的过程.比如今天早上碰到的这个DNA序列的问题,一开始完全没有头绪,但是后来看了些文章发现,真的是二进制大法好啊! 会了二进制,走遍天下都不怕啊. 原题如下: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA,

【LeetCode】Repeated DNA Sequences 解题报告

[题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-lon

简单DNA序列组装（贪婪算法）

生物信息学原理作业第四弹:DNA序列组装(贪婪算法) 原理:生物信息学(孙啸) 大致思想: 1. 找到权值最大的边: 2. 除去以最大权值边的起始顶点为起始顶点的边: 3. 除去以最大权值边为终点为终点的边: 4. 重复上述步骤,得到所有符合条件的边: 5. 拼接得到的边: 6. 加入孤立点(如果有). 附上Python代码,如果有问题我会及时更正(确实不太熟算法) 简单DNA序列组装(贪婪算法) 转载请保留出处! 1 # -*- coding: utf-8 -*- 2 """

LeetCode 187. Repeated DNA Sequences(位运算，hash)

题目题意:判断一个DNA序列中,长度为10的子序列,重复次数超过1次的序列! 题解:用一个map 就能搞定了,但是出于时间效率的优化,我们可以用位运算和数组代替map,首先只有四个字母,就可以用00,01,10,11 四个二进制表示,长度为10的序列,可以用长度为20的二进制序列表示.这样每中组合都对应一个数字,然后用数组表示每个数字出现的次数就好了. class Solution { public: int m[1<<21]; int m3[1<<21]; int m2[127