HTseq-count

HTSeq:一个用于处理高通量数据(High-throughout sequencing)的python包。
HTSeq包有很多功能类,熟悉python脚本的可以自行编写数据处理脚本。
另外,HTSeq也提供了两个脚本文件能够直接处理数据:htseq-qa(检测数据质量)和htseq-count(reads计数)。

用法:htseq-count [options] <alignment_file> <gff_file>

<alignment_file> :

contains the aligned reads in the SAM format.

Make sure to use a splicing-aware aligner such as TopHat.

To read from standard input, use - as <alignment_file>.

{options}

-f <format>--format=<format>

Format of the input data. Possible values are sam (for text SAM files) and bam (for binary BAM files). Default is sam.

-r <order>--order=<order>

For paired-end data, the alignment have to be sorted either by read name or by alignment position. If your data is not sorted, use the samtools sort function of samtools to sort it. Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

If name is indicated, htseq-count expects all the alignments for the reads of a given read pair to appear in adjacent records in the input data. For pos, this is not expected; rather, read alignments whose mate alignment have not yet been seen are kept in a buffer in memory until the mate is found. While, strictly speaking, the latter will also work with unsorted data, sorting ensures that most alignment mates appear close to each other in the data and hence the buffer is much less likely to overflow.

-s <yes/no/reverse>--stranded=<yes/no/reverse>

whether the data is from a strand-specific assay (default: yes)

For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data!

-a <minaqual>--a=<minaqual>

skip all reads with alignment quality lower than the given minimum value (default: 10 — Note: the default used to be 0 until version 0.5.4.)

-t <feature type>--type=<feature type>

feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon)

-i <id attribute>--idattr=<id attribute>

GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table. The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.

-m <mode>--mode=<mode>

Mode to handle reads overlapping more than one feature. Possible values for <mode> are unionintersection-strict and intersection-nonempty(default: union)

-o <samout>--samout=<samout>

write out all SAM alignment records into an output SAM file called <samout>, annotating each line with its assignment to a feature or a special counter (as an optional field with tag ‘XF’)

-q--quiet

suppress progress report and warnings

时间: 2024-12-30 04:54:35

HTseq-count的相关文章

nodejs api 中文文档

文档首页 英文版文档 本作品采用知识共享署名-非商业性使用 3.0 未本地化版本许可协议进行许可. Node.js v0.10.18 手册 & 文档 索引 | 在单一页面中浏览 | JSON格式 目录 关于本文档 稳定度 JSON 输出 概述 全局对象 global process console 类: Buffer require() require.resolve() require.cache require.extensions __filename __dirname module e

c#数组的count()和length的区别

C# 数组中 Length 表示数组项的个数,是个属性. 而 Count() 也是表示项的个数,是个方法,它的值和 Length 一样.但实际上严格地说 Count() 不是数组的内容,而是 IEnumerable 的内容.这也是为什么 C# 2.0 时数组不能用 Count(),而 3.0 后就可以用 Count() 的原因. 对于数组,据说用 Length 快于 Count(). 所以一般情况:数组我用 Length,IEnumerable(比如 List)我用 Count().

[LeetCode]Count Primes

题目:Count Primes 统计1-n的素数的个数. 思路1: 通常的思想就是遍历(0,n)范围内的所有数,对每个数i再遍历(0,sqrt(i)),每个除一遍来判断是否为素数,这样时间复杂度为O(n*sqrt(n)). 具体实现不在贴代码,过程很简单,两重循环就可以解决.但是效率很差,n较大时甚至会花几分钟才能跑完. 思路2: 用埃拉特斯特尼筛法的方法来求素数,时间复杂度可以达到O(nloglogn). 首先开一个大小为n的数组prime[],从2开始循环,找到一个质数后开始筛选出所有非素数

LeetCode OJ:Count Primes(质数计数)

Count the number of prime numbers less than a non-negative number, n. 计算小于n的质数的个数,当然就要用到大名鼎鼎的筛法了,代码如下,写的有点乱不好意思. 1 class Solution { 2 public: 3 int countPrimes(int n) { 4 vector<int> vtor(n + 1, 0); 5 vector<int> ret; 6 for (int i = 0; i <=

mysql count distinct 统计结果去重

mysql的sql语句中,count这个关键词能统计表中的数量,如 有一个tableA表,表中数据如下: id name age 1 tony 18 2 jacky 19 3 jojo 18 SELECT COUNT(age) FROM tableA 以上这条语句能查出table表中有多少条数据.查询结果是3 而COUNT这个关键词与 DISTINCT一同使用时,可以将统计的数据中某字段不重复的数量. 如: SELECT COUNT(DISTINCT age) from tableA 以上语句的

linq to sql (Group By/Having/Count/Sum/Min/Max/Avg操作符) (转帖)

http://wenku.baidu.com/link?url=2RsCun4Mum1SLbh-LHYZpTmGFMiEukrWAoJGKGpkiHKHeafJcx2y-HVttNMb1BqJpNdwaOpCflaajFY6k36IoCH_D82bk2ccu468uzDRXvG 基于LINQ+to+Entity数据访问技术的应用研究 Group By/Having操作符 适用场景:分组数据,为我们查找数据缩小范围. 说明:分配并返回对传入参数进行分组操作后的可枚举对象.分组:延迟 1.简单形式:

[LeetCode] 222. Count Complete Tree Nodes Java

题目: Given a complete binary tree, count the number of nodes. Definition of a complete binary tree from Wikipedia:In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as

[LeetCode]Count of Range Sum

题目:Count of Range Sum Given an integer array nums, return the number of range sums that lie in [lower, upper] inclusive.Range sum S(i, j) is defined as the sum of the elements in nums between indices i and j (i ≤ j), inclusive. Note:A naive algorithm

MySQL优化之COUNT(*)效率(部分转载与个人亲测)

说到MySQL的COUNT(*)的效率,发现越说越说不清楚,干脆写下来,分享给大家. COUNT(*)与COUNT(COL)网上搜索了下,发现各种说法都有:比如认为COUNT(COL)比COUNT(*)快的:认为COUNT(*)比COUNT(COL)快的:还有朋友很搞笑的说到这个其实是看人品的. 在不加WHERE限制条件的情况下,COUNT(*)与COUNT(COL)基本可以认为是等价的:但是在有WHERE限制条件的情况下,COUNT(*)会比COUNT(COL)快非常多: 具体的数据参考如下:

38. Count and Say序列 Count and Say

The count-and-say sequence is the sequence of integers beginning as follows:1, 11, 21, 1211, 111221, ... 1 is read off as "one 1" or 11.11 is read off as "two 1s" or 21.21 is read off as "one 2, then one 1" or 1211. Given an