要用bedtools了, 当然要熟悉bed文件格式
一共十二列
1, chrom 不解释
2, start, 0-based
3, end, 说明书说是1-based, include。 不如理解为0-based, exclude
4, name, genome feature 的名字
5, score, 没什么用好像
6, strand, +or-
7, thickstart, the starting position at which the feature is drawn thickly
8, thickend , the ending positiion at which the feature is drawn thickly
9, intemRGB, an RGB value
10, blockcount, the number of blocks(exons) in the bed line
11, blocksizes, a comma-separated list of the block sizes
12, blockstarts, a comma-separated list of block starts
并不是每列对bedtools都有用。
BED3 指前三列, chrom, start, end
BED4 前四列 , chrom start end name
BED5 前五列, chrom start end name score
BED6 前六列, chrom start end name score strand
BED12 所有列
除了以上几种,bedtools还定义了bedpe format:
We have defined a new file format (BEDPE) in order to concisely describe disjoint genome features, such as structural variations or paired-end sequence alignments. We chose to define a new format because the existing “blocked” BED format (a.k.a. BED12) does not allow inter-chromosomal feature definitions. In addition, BED12 only has one strand field, which is insufficient for paired-end sequence alignments, especially when studying structural variation.
by freemao
FAFU.
[email protected]