Repeated DNA Sequences @leetcode

早上到公司第一件事变成了刷leetcode,发现各种题目的各种解法,真心是个挺有趣的过程。比如今天早上碰到的这个DNA序列的问题,一开始完全没有头绪,但是后来看了些文章发现,真的是二进制大法好啊!

会了二进制,走遍天下都不怕啊。

原题如下:

All DNA is composed of a series of nucleotides abbreviated as A, C, G,

and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes

useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings)

that occur more than once in a DNA molecule.

For example,

Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,

Return: [“AAAAACCCCC”, “CCCCCAAAAA”].

这一题经典的用二进制序列表示字符串序列,以减少内存消耗的例子。

题目中提到DNA序列只包含四种碱基对,分别用A,C,G和T表示,那么就可以用二进制数来分别代表它们:

A:00

C:01

G:10

T:11

那么形如ACGT的DNA序列就可以表示为00011011,也就是27。而且这个值对于所有DNA序列都是唯一的,那么就可以把它作为key,出现的次数作为value,将已出现过的key都放入哈希表中即可。

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> result = new LinkedList<String>();
        HashMap<Character, Integer> tokenValueMap = new HashMap<Character, Integer>();
        tokenValueMap.put(‘A‘, 0);
        tokenValueMap.put(‘C‘, 1);
        tokenValueMap.put(‘G‘, 2);
        tokenValueMap.put(‘T‘, 3);
        HashMap<Integer, Integer> sequenceCountMap = new HashMap<Integer, Integer>();
        int length = s.length();
        for (int index = 0; index <= length - 10; index++) {
            int value = 0;
            for (int i = 0; i < 10; i++) {
                value <<= 2;
                value += tokenValueMap.get(s.charAt(index + i));
            }
            if (!sequenceCountMap.containsKey(value)) {
                sequenceCountMap.put(value, 1);
            } else if (sequenceCountMap.get(value) == 1) {
                sequenceCountMap.put(value, 2);
                result.add(s.substring(index, index + 10));
            }
        }
        return result;
    }
}

上面的java代码可以完美ac,但是大家再看下面这段:

    public static List<String> findRepeatedDnaSequences2(String s) {
        List<String> result = new ArrayList<String>();
        Map<Character, Integer> tokenValueMap = new HashMap<Character, Integer>();
        tokenValueMap.put(‘A‘, 0);
        tokenValueMap.put(‘C‘, 1);
        tokenValueMap.put(‘G‘, 2);
        tokenValueMap.put(‘T‘, 3);
        int length = s.length();
        Map<Integer, Integer> seqMap = new HashMap<Integer, Integer>();
        for (int i=0; i<=length-10; i++) {
            int value = 0;
            for (int j=0; j<10; j++) {
                value <<= 2;
                Character c = s.charAt(i+j);
                Integer tokenValue = tokenValueMap.get(c);
                value += tokenValue;
            }

            if (!seqMap.containsKey(value)) {
                seqMap.put(value, 1);
            } else if (seqMap.get(value) == 1) {
                result.add(s.substring(i,i+10));
                seqMap.put(value, seqMap.get(value)+1);
            }
        }
        return result;
    }

这一段代码就有可能报Memory Limit Exceeded

但是如果你多提交几次,你会发现居然有可能AC了。

这完全取决于虚拟机,是否在提交过程中是否有对垃圾进行回收,因为在

     for (int j=0; j<10; j++) {
         value <<= 2;
         Character c = s.charAt(i+j);
         Integer tokenValue = tokenValueMap.get(c);
         value += tokenValue;
     }

这个for循环中产生了非常多的character对象。

时间: 2024-10-05 05:06:08

Repeated DNA Sequences @leetcode的相关文章

187. Repeated DNA Sequences Leetcode Python

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq

[LeetCode]Repeated DNA Sequences

题目:Repeated DNA Sequences 给定包含A.C.G.T四个字符的字符串找出其中十个字符的重复子串. 思路: 首先,string中只有ACGT四个字符,因此可以将string看成是1,3,7,20这三个数字的组合串: 并且可以发现{ACGT}%5={1,3,2,0};于是可以用两个位就能表示上面的四个字符: 同时,一个子序列有10个字符,一共需要20bit,即int型数据类型就能表示一个子序列: 这样可以使用计数排序的思想来统计重复子序列: 这个思路时间复杂度只有O(n),但是

leetcode 204/187/205 Count Primes/Repeated DNA Sequences/Isomorphic Strings

一:leetcode 204 Count Primes 题目: Description: Count the number of prime numbers less than a non-negative number, n 分析:此题的算法源码可以参看这里,http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes 代码: class Solution { public: int countPrimes(int n) { // 求小于一个数n的素数个

【LeetCode】187. Repeated DNA Sequences

Repeated DNA Sequences All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all

Repeated DNA Sequences

package cn.edu.xidian.sselab.hashtable; import java.util.ArrayList;import java.util.HashSet;import java.util.List;import java.util.Set; /** *  * @author zhiyong wang * title: Repeated DNA Sequences * content: *  All DNA is composed of a series of nuc

[LeetCode]Repeated DNA Sequences,解题报告

目录 目录 前言 题目 Native思路 二进制思路 AC 前言 最近在LeetCode上能一次AC的概率越来越低了,我这里也是把每次不能一次AC的题目记录下来,把解题思路分享给大家. 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to

Leetcode:Repeated DNA Sequences详细题解

题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long

【LeetCode】Repeated DNA Sequences 解题报告

[题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-lon

【Leetcode】Repeated DNA Sequences

题目链接:https://leetcode.com/problems/repeated-dna-sequences/ 题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences with