[BZOJ4892][TJOI2017]DNA(后缀数组)

题目描述

加里敦大学的生物研究所，发现了决定人喜不喜欢吃藕的基因序列S,有这个序列的碱基序列就会表现出喜欢吃藕的性状，但是研究人员发现对碱基序列S,任意修改其中不超过3个碱基，依然能够表现出吃藕的性状。现在研究人员想知道这个基因在DNA链S0上的位置。所以你需要统计在一个表现出吃藕性状的人的DNA序列S0上，有多少个连续子串可能是该基因，即有多少个S0的连续子串修改小于等于三个字母能够变成S。

输入输出格式

输入格式：

第一行有一个数T,表示有几组数据每组数据第一行一个长度不超过10^5的碱基序列S0

每组数据第二行一个长度不超过10^5的吃藕基因序列S

输出格式：

共T行，第i行表示第i组数据中，在S0中有多少个与S等长的连续子串可能是表现吃藕性状的碱基序列

输入输出样例

输入样例#1：复制
1
ATCGCCCTA
CTTCA
输出样例#1：复制
2
说明

对于20%的数据，S0,S的长度不超过10^4

对于20%的数据，S0,S的长度不超过10^5，0<T<=10

两个串连起来中间插个特殊字符然后后缀数组求四次LCP即可。

启示：永远不要低估SA模板的默写难度。

 1 #include<cstdio>
 2 #include<cstring>
 3 #include<algorithm>
 4 #define rep(i,l,r) for (int i=l; i<=r; i++)
 5 typedef long long ll;
 6 using namespace std;
 7
 8 const int N=200100;
 9 int n,m,T,ans,x[N],y[N],sa[N],log[N],c[N],rk[N],h[N],st[N][20];
10 char s[N],s1[N],S[N];
11
12 int Cmp(int a,int b,int l){ return y[a]==y[b] && y[a+l]==y[b+l]; }
13
14 void build(int m){
15    memset(y,0,sizeof(y));
16    rep(i,0,m) c[i]=0;
17    rep(i,1,n) c[x[i]]++;
18    rep(i,1,m) c[i]+=c[i-1];
19    for (int i=n; i; i--) sa[c[x[i]]--]=i;
20    for (int k=1,p=0; p<n; k<<=1,m=p){
21       p=0;
22       rep(i,n-k+1,n) y[++p]=i;
23       rep(i,1,n) if (sa[i]>k) y[++p]=sa[i]-k;
24       rep(i,0,m) c[i]=0;
25       rep(i,1,n) c[x[y[i]]]++;
26       rep(i,1,m) c[i]+=c[i-1];
27       for (int i=n; i; i--) sa[c[x[y[i]]]--]=y[i];
28       rep(i,1,n) y[i]=x[i]; p=1; x[sa[1]]=1;
29       rep(i,2,n) x[sa[i]]=Cmp(sa[i-1],sa[i],k) ? p : ++p;
30    }
31 }
32
33 void get(){
34    int k=0;
35    rep(i,1,n) rk[sa[i]]=i;
36    rep(i,1,n){
37       for (int j=sa[rk[i]-1]; i+k<=n && j+k<=n && S[i+k]==S[j+k]; k++);
38       h[rk[i]]=k; if (k) k--;
39    }
40 }
41
42 void rmq(){
43    rep(i,1,n) st[i][0]=h[i];
44    rep(i,1,log[n])
45       rep(j,1,n-(1<<i)+1) st[j][i]=min(st[j][i-1],st[j+(1<<(i-1))][i-1]);
46 }
47
48 int ask(int l,int r){
49    if (l>r) swap(l,r);
50    l++; int t=log[r-l+1];
51    return min(st[l][t],st[r-(1<<t)+1][t]);
52 }
53
54 int main(){
55     log[1]=0; rep(i,2,N-100) log[i]=log[i>>1]+1;
56     for (scanf("%d",&T); T--; ){
57         scanf("%s",s+1); scanf("%s",s1+1);
58         int n0=strlen(s+1),n1=strlen(s1+1);
59         rep(i,1,n0) S[i]=s[i]; S[strlen(s+1)+1]=‘$‘;
60         rep(i,1,n1) S[strlen(s+1)+i+1]=s1[i];
61         n=n0+n1+1;
62         rep(i,1,n) x[i]=(int)S[i];
63         build(300); get(); rmq(); ans=0;
64         rep(i,1,n0-n1+1){
65             int a=i,b=n0+2,f=0;
66             rep(j,1,4){
67                 int t=ask(rk[a],rk[b]);
68                 if (b+t>n) { f=1; break; }
69                 if (i+t>n0) break;
70                 a+=t+1; b+=t+1;
71             }
72             if (f) ans++;
73         }
74         printf("%d\n",ans);
75     }
76     return 0;
77 }

原文地址：https://www.cnblogs.com/HocRiser/p/8457695.html

时间： 2024-08-01 12:32:32

[BZOJ4892][TJOI2017]DNA(后缀数组)的相关文章

[TJOI2017]DNA --- 后缀数组

[TJOI2017]DNA 题目描述加里敦大学的生物研究所,发现了决定人喜不喜欢吃藕的基因序列S, 有这个序列的碱基序列就会表现出喜欢吃藕的性状,但是研究人员发现对碱基序列S,任意修改其中不超过3个碱基,依然能够表现出吃藕的性状. 现在研究人员想知道这个基因在DNA链\(S_{0}\)上的位置. 所以你需要统计在一个表现出吃藕性状的人的DNA序列\(S_{0}\)上,有多少个子串可能是该基因, 即有多少个\(S_{0}\)的子串修改小于等于三个字母能够变成S. 输入输出格式输入格式: 第一行

[TJOI2017]DNA——后缀数组求LCP

题目大意: 给定一个文本串和一个模式串,求文本串中有多少个连续的子串和模式串相差不超过三个字符. 思路: 算是一道后缀数组的模板题. 直接做lcp,然后遇到匹配不上的就跳,跳的次数不能超过三次. 具体地,将两个字符串连在一起,中间加一个分隔符,然后求出height,用rmq维护height数组的区间最小值即可. /*======================================= * Author : ylsoi * Time : 2019.2.5 * Problem : luog

bzoj4892 [TJOI2017]DNA

\(\verb|bzoj4892 [TJOI2017]DNA|\) 给定一个匹配串和一个模式串,求模式串有多少个连续子串能够修改不超过 \(3\) 个字符变成匹配串 \(len\leq10^5\) hash 枚举子串左端点,hash 求 lcp 枚举断点,接着跳过断点,记作一次修改,最多修改 \(3\) 次.特判修改 \(3\) 次后剩余部分是否相等. 时间复杂度 \(O(n\log n)\) 代码 #include <bits/stdc++.h> using namespace std; t

你可以在 Python 中轻易创建一个后缀数组

我想告诉你一个关于后缀数组的故事.在一段时间里,我正在西雅图的一家公司面试,当时好奇的是如何最有效地创建一个用于可执行二进制文件的diff.我的研究给我带来了后缀数组和后缀树.后缀数组只是,将字符串的所有后缀排序,储存到有序列表中.后缀树是类似的,但是比列表更像BSTree.这些算法相当简单,一旦你进行了排序操作,它们就具有很快的性能.他们解决的问题是,找到两个字符串之间最长的公共子串(或者在这种情况下是字节列表). 你可以在 Python 中轻易创建一个后缀数组: >>> magic

后缀数组（多个字符串的最长公共子串）—— POJ 3294

对应POJ 题目:点击打开链接 Life Forms Time Limit:6666MS Memory Limit:0KB 64bit IO Format:%lld & %llu Submit Status Description Problem C: Life Forms You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial tra

POJ3294:Life Forms(后缀数组)

Description You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or

Blue Jeans - poj 3080(后缀数组)

大致题意: 给出n个长度为60的DNA基因(A腺嘌呤 G鸟嘌呤 T胸腺嘧啶 C胞嘧啶)序列,求出他们的最长公共子序列使用后缀数组解决 1 #include<stdio.h> 2 #include<string.h> 3 char str[6200],res[6200]; 4 int num[6200],loc[6200]; 5 int sa[6200],rank[6200],height[6200]; 6 int wa[6200],wb[6200],wv[6200],wd[620

wenbao与后缀数组

倍增算法(da) 1 #define maxn 1000001 2 int wa[maxn],wb[maxn],wv[maxn],wss[maxn]; 3 4 int cmp(int *r, int a, int b, int l) { 5 return r[a] == r[b] && r[a+l] == r[b+l]; 6 } 7 8 void da(int *r, int *sa, int n, int m) { 9 int i, j, p, *x = wa, *y = wb, *t;

SPOJ 705 Distinct Substrings（后缀数组）

[题目链接] http://www.spoj.com/problems/SUBST1/ [题目大意] 给出一个串,求出不相同的子串的个数. [题解] 对原串做一遍后缀数组,按照后缀的名次进行遍历, 每个后缀对答案的贡献为n-sa[i]+1-h[i], 因为排名相邻的后缀一定是公共前缀最长的, 那么就可以有效地通过LCP去除重复计算的子串. [代码] #include <cstdio> #include <cstring> #include <algorithm> usi