首先上题目:
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S =S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6
The answers to these M = 3 queries are as follows:
- The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
- The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
- The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide Awhose impact factor is 1, so the answer is 1.
Assume that the following declarations are given:
struct Results {
int * A;
int M;
};
Write a function:
struct Results solution(char *S, int P[], int Q[], int M);
that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
The sequence should be returned as:
- a Results structure (in C), or
- a vector of integers (in C++), or
- a Results record (in Pascal), or
- an array of integers (in any other programming language).
For example, given the string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6
the function should return the values [2, 4, 1], as explained above.
Assume that:
- N is an integer within the range [1..100,000];
- M is an integer within the range [1..50,000];
- each element of arrays P, Q is an integer within the range [0..N − 1];
- P[K] ≤ Q[K], where 0 ≤ K < M;
- string S consists only of upper-case English letters A, C, G, T.
Complexity:
- expected worst-case time complexity is O(N+M);
- expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Copyright 2009–2015 by Codility Limited. All Rights Reserved. Unauthorized copying, publication or disclosure prohibited.
2.题目分析
这个题目看上去蛮简单,就是在一个序列中的任意一段slice中,找出出现过的最小值。
不过要求时间复杂度为O(N),便费了一些周折。
1. 首先是将str转换为int 数组,为后面的处理加速,免去每次都要遍历。
先用strlen函数获得str的长度,再通过temp 指针遍历这个str,将A转为1,C转为2,G转为3,T转为4。
获得了这个数组之后就比较好操作了~
2. 获得任何段的最小值看似很简单,遍历就可以了,不过这样时间复杂度就会变为O(N*M)。
如何减少时间复杂度呢?答案是用空间换时间。
通过prefix的想法,我先统计出了在每一个位置之前包括这个位置,1所出现的次数,2所出现的次数,3所出现的次数,4所出现的次数。
分别存入数组A,B,C,D;
每次查询时,只要判断A[end of slice]-A[Begin of Slice]是否大于零,即可判断是否1出现过。这一步的时间复杂度就从O(N)降为了O(1);
注意,需要检查str[begin of slice]是否为1,因为当这个位置是1时,是不会表现出来的。
其余查询依然,通过增加空间复杂度,降低了时间的复杂度。
找出最小的出现值即可。
3.代码如下:
1 // you can write to stdout for debugging purposes, e.g. 2 // printf("this is a debug message\n"); 3 4 struct Results solution(char *S, int P[], int Q[], int M) { 5 struct Results result; 6 // write your code in C99 7 int len = strlen(S); 8 // printf("len is %d",len); 9 10 11 int arr[len]; 12 int i; 13 char* temp = S; 14 for(i=0;i<len;i++) 15 { 16 if(*temp==‘C‘) 17 { 18 arr[i]=2; 19 } 20 else if(*temp==‘A‘) 21 { 22 arr[i]=1; 23 } 24 else if(*temp==‘G‘) 25 { 26 arr[i]=3; 27 } 28 else if(*temp==‘T‘) 29 { 30 arr[i]=4; 31 } 32 temp++; 33 } 34 35 int A[len]; 36 int B[len]; 37 int C[len]; 38 int D[len]; 39 40 if(arr[0]==1) 41 { 42 A[0]=1; 43 B[0]=0; 44 C[0]=0; 45 D[0]=0; 46 } 47 48 if(arr[0]==2) 49 { 50 A[0]=0; 51 B[0]=1; 52 C[0]=0; 53 D[0]=0; 54 } 55 56 57 if(arr[0]==3) 58 { 59 A[0]=0; 60 B[0]=0; 61 C[0]=1; 62 D[0]=0; 63 } 64 65 if(arr[0]==4) 66 { 67 A[0]=0; 68 B[0]=0; 69 C[0]=0; 70 D[0]=1; 71 } 72 73 // printf("%d %d %d %d \n",A[0],B[0],C[0],D[0]); 74 for(i=1;i<len;i++) 75 { 76 // printf("%d\n",arr[i]); 77 if(arr[i]==1) 78 { 79 A[i]=A[i-1]+1; 80 B[i]=B[i-1]; 81 C[i]=C[i-1]; 82 D[i]=D[i-1]; 83 } 84 85 if(arr[i]==2) 86 { 87 A[i]=A[i-1]; 88 B[i]=B[i-1]+1; 89 C[i]=C[i-1]; 90 D[i]=D[i-1]; 91 } 92 93 94 if(arr[i]==3) 95 { 96 A[i]=A[i-1]; 97 B[i]=B[i-1]; 98 C[i]=C[i-1]+1; 99 D[i]=D[i-1]; 100 } 101 102 if(arr[i]==4) 103 { 104 A[i]=A[i-1]; 105 B[i]=B[i-1]; 106 C[i]=C[i-1]; 107 D[i]=D[i-1]+1; 108 } 109 110 // printf("%d %d %d %d \n",A[i],B[i],C[i],D[i]); 111 } 112 result.A = malloc(sizeof(int)*M); 113 result.M = M; 114 115 for(i=0;i<M;i++) 116 { 117 int tempP = P[i]; 118 int tempQ = Q[i]; 119 if(arr[tempP]==1 || (A[tempQ]-A[tempP]) > 0) 120 { 121 result.A[i]=1; 122 } 123 else if(arr[tempP]==2 || (B[tempQ]-B[tempP]) > 0) 124 { 125 result.A[i]=2; 126 } 127 else if(arr[tempP]==3 || (C[tempQ]-C[tempP]) > 0) 128 { 129 result.A[i]=3; 130 } 131 else 132 { 133 result.A[i]=4; 134 } 135 } 136 137 138 return result; 139 }