Bioinfomatics dataset

##Genomic sequence variation

###1000 Genomes Project
http://www.1000genomes.org/
Data collection and a catalog of human variation

###dbSNP
http://www.ncbi.nlm.nih.gov/projects/SNP/
A catalog ofSNPs and short indels

###dbVar and Database of Genomic Variants
http://www.ncbi.nlm.nih.gov/dbvar/
http://dgv.tcag.ca/dgv/app/home?ref=GRCh37/hg19
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=dgvPlus (browser track)
A catalog of structural variants

###Online Mendelian Inheritance in Man http://www.omim.org/about
OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.

###The Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/
ExAC is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 61,486 unrelated individuals sequenced as part of various disease-specific and population genetic studies. We have removed individuals affected by severe pediatric disease, so this data set should serve as a useful reference set of allele frequencies for severe disease studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects.

##Molecular function

###Encyclopedia Of DNA Elements (ENCODE) Project
http://encodeproject.org/
Links to ENCODE2 uniformly processed histone mark data: https://sites.google.com/site/anshulkundaje/projects/encodehistonemods
Links to other ENCODE2 uniformly processed data: http://genome.ucsc.edu/ENCODE/downloads.html
Data collection, integrative analysis, and a comprehensive catalog of
all sequence-based functional elements

###Roadmap Epigenomics Project (NIH Common Fund)
http://compbio.mit.edu/roadmap (Uniformly processed data)
http://www.roadmapepigenomics.org/
https://commonfund.nih.gov/epigenomics/
Data collection, integrative analysis and a resource of human epigenomic data

###International Human Epigenome Consortium (IHEC)
http://www.ihec-epigenomes.org/
Data collection and reference maps of human epigenomes for key
cellular states relevant to health and diseases

###BLUEPRINT Epigenome http://www.blueprint-epigenome.eu/
http://www.nature.com/nbt/journal/v30/n3/full/nbt.2153.html
Data collection on the epigenome of blood cells

###Human BodyMap Viewable with Ensemble (http://www.ensembl.org/index.html) or the
Integrated Genomics Viewer (http://www.broadinstitute.org/igv/)
Gene expression database from Illumina, from RNA-seq data

###Cancer CellLine Encyclopedia (CCLE) http://www.broadinstitute.org/ccle/home
Array based expression data, CNV, mutations, perturbations over huge collection of cell lines

###FANTOM5 Project http://fantom.gsc.riken.jp/
http://fantom.gsc.riken.jp/5/sstar/Data_source
Large collection of CAGE based expression data across multiple species (time-series and perturbations)

###Array Express http://www.ebi.ac.uk/arrayexpress/
Database of gene expression experiments

###Gene Expression Atlas http://www.ebi.ac.uk/gxa/
Database supporting queries of condition-specific gene expression on
a curated subset of the Array Express Archive.

###GNF Gene Expression Atlas Viewable at BioGPS (http://biogps.org/#goto=welcome)
GNF (Genomics Institute of the Novartis Research Foundation) human and mouse gene expression array data.

###The Human Protein Atlas http://www.proteinatlas.org/
Protein expression profiles based on immunohistochemistry for a large number of human tissues, cancers and cell lines, subcellular localization, transcript expression levels

###UniProt http://www.uniprot.org/
A comprehensive, freely accessible database of protein sequence and
functional information

###InterPro http://www.ebi.ac.uk/interpro/
An integrated database of protein classification, functional domains,
and annotation (including GO terms).

###Protein Capture Reagents Initiative http://commonfund.nih.gov/proteincapture/
Resource generation: renewable, monoclonal antibodies and other reagents that target the full range of proteins

###Knockout Mouse Program (KOMP) http://www.nih.gov/science/models/mouse/knockout/index.html
Resource generation: create knockout strains for all mouse genes

###The Connectivity Map (CMAP) http://www.broadinstitute.org/cmap/
The Connectivity Map (also known as cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes. You can learn more about cmap from our papers in Science and Nature Reviews Cancer.

###Library of Integrated Network-based Cellular Signatures (LINCS) https://commonfund.nih.gov/LINCS/
Data collection and analysis of molecular signatures that describe how
different types of cells respond to a variety of perturbing agents

###Genomic of drug sensitivity in cancer http://www.cancerrxgene.org/
Mutation, CNV, Affy expression and drug sensitivity in ~300 cancer cell-lines
Papers:
http://nar.oxfordjournals.org/content/41/D1/D955.long
http://www.nature.com/nature/journal/v483/n7391/full/nature11005.html

###The Drug Gene Interaction database (DGIdb)
http://dgidb.genome.wustl.edu/

###Molecular Libraries Program (MLP) https://commonfund.nih.gov/molecularlibraries/index.aspx
Access to the large-scale screening capacity necessary to identify small molecules that can be optimized as chemical probes to study the functions of genes, cells, and biochemical pathways in health and disease

###Allen Brain Atlas http://www.brain-map.org/
Data collection and an online public resources integrating extensive gene expression and neuroanatomical data for human and mouse, including variation of mosue gene expression by strain.

###BrainCloud http://braincloud.jhmi.edu/
BrainCloud is a freely-available, biologist-friendly, stand-alone application for exploring the temporal dynamics and genetic control of transcription in the human prefrontal cortex across the lifespan. BrainCloud was developed through collaboration between the Lieber Institute and NIMH

###The Human Connectome Project http://www.humanconnectomeproject.org/
Data collection and integration to create a complete map of the structural and functional neural connections, within and across individuals

###Geuvadis RNA sequencing project of 1000 Genomes samples http://www.geuvadis.org/web/geuvadis
mRNA and small RNA sequencing on 465 lymphoblastoid cell line (LCL) samples from 5 populations of the 1000 Genomes Project: the CEPH (CEU), Finns (FIN), British (GBR), Toscani (TSI) and Yoruba (YRI).

###The Achilles Project http://www.broadinstitute.org/achilles Project Achilles is a systematic effort aimed at identifying and cataloging genetic vulnerabilities across hundreds of genomically characterized cancer cell lines. The project uses a genome-wide shRNA library to silence individual genes and identify those genes that affect cell survival. Large-scale functional screening of cancer cell lines provides a complementary approach to those studies that aim to characterize the molecular alterations (mutations, copy number alterations, etc.) of primary tumors, such as The Cancer Genome Atlas. The overall goal of the project is to link cancer genetic dependencies to their molecular characteristics in order to Identify molecular targets and guide therapeutic development.

##Phenotypes and disease

###Human Ageing Genomic Resources
http://genomics.senescence.info/

###The Cancer Genome Atlas (TCGA) http://cancergenome.nih.gov/
Data collection and a data repository, including cancer genome sequence data

###International Cancer Genome Consortium (ICGC) http://www.icgc.org/
Data collection and a data repository for a comprehensive description of genomic, transcriptomic and epigenomic changes of cancer

###Genotype-Tissue Expression (GTEx) Project https://commonfund.nih.gov/GTEx/
Data collection, data repository, and sample bank for human gene expression and regulation in multiple tissues, compared to genetic variation

###Knockout Mouse Phenotyping Program (KOMP2) https://commonfund.nih.gov/KOMP2/
Data collection for standardized phenotyping of a genome-wide collection of mouse knockouts

###Database of Genotypes and Phenotypes (dbGaP) http://www.ncbi.nlm.nih.gov/gap
Data repository for results from studies investigating the interaction of genotype and phenotype

###NHGRI Catalog of Published GWAS http://www.genome.gov/gwastudies/
Public catalog of published Genome-Wide Association Studies

###Clinical Genomic Database http://research.nhgri.nih.gov/CGD/
A manually curated database of conditions with known genetic causes, focusing on medically significant genetic data with available interventions.

###NHGRI‘s Breast Cancer information core http://research.nhgri.nih.gov/bic/
Breast Cancer Mutation database

###ClinVar http://www.ncbi.nlm.nih.gov/clinvar/
ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible.

###Human Gene Mutation Database (HGMD) http://www.hgmd.cf.ac.uk/ac/
The Human Gene Mutation Database (HGMD?) represents an attempt to collate known (published) gene lesions responsible for human inherited disease

###NHLBI Exome Sequencing Project (ESP) Exome Variant Server
http://evs.gs.washington.edu/EVS/
The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.

###Genetics Home Reference http://ghr.nlm.nih.gov/
Genetics Home Reference is the National Library of Medicine‘s web site for consumer information about genetic conditions and the genes or chromosomes related to those conditions.

###GeneReviews http://www.ncbi.nlm.nih.gov/books/NBK1116/
GeneReviews are expert-authored, peer-reviewed disease descriptions presented in a standardized format and focused on clinically relevant and medically actionable information on the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions.

###Global Alzheimer‘s Association Interactive Network (GAAIN) http://www.gaain.org/
The Global Alzheimer’s Association Interactive Network (GAAIN) is a collaborative project that will provide researchers around the globe with access to a vast repository of Alzheimer’s disease research data and the sophisticated analytical tools and computational power needed to work with that data. Our goal is to transform the way scientists work together to answer key questions related to understanding the causes, diagnosis, treatment and prevention of Alzheimer’s and other neurodegenerative diseases.
In 2013, obtained WGS data for the largest cohort of 800 Alzheimer‘s patients

###The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortiumhttp://web.chargeconsortium.com/
The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium was formed to facilitate genome-wide association study meta-analyses and replication opportunities among multiple large and well-phenotyped longitudinal cohort studies. They also have DNA methylation data alongside WGS and Exome Seq.

###The NIMH Center for Collaborative Genomic Studies on Mental Disorders (Include Psychiatric Disease Consortium https://pgc.unc.edu/)
https://www.nimhgenetics.org/
The NIMH Center, now known as NIMH Repository and Genomics Resource (NIMH-RGR) plays a key role in facilitating psychiatric genetic research by providing a collection of over 150,000 well characterized, high quality patient and control samples from a wide-range of mental disorders.

##Data integration

###UCSC Genome Bioinformatics http://genome.ucsc.edu/
Genome databases displayed through a genome browser for vertebrates, other eukaryotes, and prokaryotes, including sequence conservation, transcript maps and expression, functional annotation, genetic variation, and human disease information

###Ensembl http://www.ensembl.org/index.html
Genome databases displayed through a genome browser for vertebrates and other eukaryotic species, including sequence conservation, transcript maps and expression, functional annotation, genetic variation, and human disease information

###Reactome http://www.reactome.org/ReactomeGWT/entrypoint.html
Pathway database: open-source, open access, manually curated and peer-reviewed

###Molecular Signatures Database (MSigDB) http://www.broadinstitute.org/gsea/msigdb/index.jsp
MSigDB is a collection of annotated gene sets for use with Gene Set Enrichment (GSEA) software

###KEGG: Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/
Database of pathways, diseases, drugs

###BIOCARTA http://www.biocarta.com/
Pathway analysis resource

###Genomatix http://www.genomatix.de/
Proprietary genome annotation and pathway analysis software

###GOLD:Genomes Online Database http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
Information regarding genome and metagenome sequencing projects, and their associated metadata, around the world

###ImmPort: Immunology Database and Analysis Portal https://immport.niaid.nih.gov
The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. It serves as a long-term, sustainable archive of data generated by investigators funded through the NIAID/DAIT. The core component of the ImmPort system is an extensive data warehouse containing an integration of experimental data supplied by NIAID/DAIT-funded investigators and genomic, proteomic, and other data relevant to the research of these programs extracted from a variety of public databases. The ImmPort system also provides data analysis tools and an immunology-focused ontology.

##Model organism databases (selected examples)

Mouse Genome Informatics
http://www.informatics.jax.org/
Includes genotypes with phenotype annotations, human diseases with one or more mouse models, expression assays and images, pathways, and refSNPs

###Rat Genome Database (RGD) http://rgd.mcw.edu/
Repository of rat genetic and genomic data, as well as mapping, strain, and physiological information

FlyBase http://flybase.org/
A Database of Drosophila Genes & Genomes

WormBase
http://www.wormbase.org/
The genetics, genomics and biology of C. elegans and related nematodes

###The Zebrafish Model Organism Database (ZFIN) http://zfin.org/
Support integrated zebrafish genetic, genomic and developmental information

###XenBase http://www.xenbase.org/common/
Xenopus laevis and Xenopus tropicalis biology and genomics resource

###Saccharomyces Genome Database (SGD) http://www.yeastgenome.org/
Integrated biological information for budding yeast, along with search and analysis tools

###Others 1000 Genomes

American Gut (Microbiome Project)

Broad Cancer Cell Line Encyclopedia (CCLE)

Cell Image Library

Collaborative Research in Computational Neuroscience (CRCNS)

Complete Genomics Public Data

EBI ArrayExpress

EBI Protein Data Bank in Europe

ENCODE project

Ensembl Genomes

Gene Expression Omnibus (GEO)

Gene Ontology (GO)

Genotype-Tissue Expression (GTEx)

Global Biotic Interactions (GloBI)

Harvard Medical School (HMS) LINCS Project

Human Genome Diversity Project

Human Microbiome Project (HMP)

ICOS PSP Benchmark

International HapMap Project

Journal of Cell Biology DataViewer

MIT Cancer Genomics Data

NCBI Proteins

NCBI Taxonomy

NeuroData

NIH Microarray data

OpenSNP genotypes data

Pathguid - Protein-Protein Interactions Catalog

Protein Data Bank

Psychiatric Genomics Consortium

PubChem Project

PubGene (now Coremine Medical)

Sanger Catalogue of Somatic Mutations in Cancer (COSMIC)

Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC)

Sequence Read Archive(SRA)

Stanford Microarray Data

Stowers Institute Original Data Repository

Systems Science of Biological Dynamics (SSBD) Database

Temple University Hospital EEG Database

The Cancer Genome Atlas (TCGA), available via Broad GDAC

The Catalogue of Life

The Exome Aggregation Consortium (ExAC)

The Personal Genome Project PGP

UCSC Public Data

Universal Protein Resource (UnitProt)

UniGene

时间: 2024-10-18 10:18:33

Bioinfomatics dataset的相关文章

tablib.Dataset()操作exl类型数据之“类方法”研究

#tablib.Dataset()操作exl类型数据之“类方法”研究 import tablib #初始化 data=tablib.Dataset() #设置列标题 data.headers=('name','where') some_collector=[('xiaodeng','enshi'),('xiaoming','hubei'),('xiaodong','enshi')] #写入数据 for key,value in some_collector: data.append((key,v

DataSet中compute的使用

在为筛选器创建表达式时,用单引号将字符串括起来: "LastName = 'Jones'" 下面的字符是特殊字符,如下面所解释的,如果它们用于列名称中,就必须进行转义: \n (newline) \t (tab) \r (carriage return) ~ ( ) # \ / = > < + - * % & | ^ ' " [ ] 如果列名称包含上面的字符之一,该名称必须用中括号括起来.例如,若要在表达式中使用名为“Column#”的列,应写成“[Col

spark的数据结构 RDD——DataFrame——DataSet区别

转载自:http://blog.csdn.net/wo334499/article/details/51689549 RDD 优点: 编译时类型安全 编译时就能检查出类型错误 面向对象的编程风格 直接通过类名点的方式来操作数据 缺点: 序列化和反序列化的性能开销 无论是集群间的通信, 还是IO操作都需要对对象的结构和数据进行序列化和反序列化. GC的性能开销 频繁的创建和销毁对象, 势必会增加GC import org.apache.spark.sql.SQLContext import org

利用DataSet部分功能实现网站登录

这是我的第一篇博文,有一丝小激动,不曾想有一天我也能写出一点经验为大家服务.如有表达不清请多见谅. 首先,我之前必须完成过注册,并把个人信息存入数据库中. 其次,这部分的个别对象是存于某些文档中的,需要引用命名空间. using System;using System.Collections.Generic;using System.Linq;using System.Web;using System.Web.UI;using System.Web.UI.WebControls;using ZG

Linq实战 之 DataSet操作详解

Linq实战 之 DataSet操作详解  一:linq to Ado.Net 1. linq为什么要扩展ado.net,原因在于给既有代码增加福利.FCL中在ado.net上扩展了一些方法. 简单一点的说: 就是在DatTable 和 DataRow 上面做了一些扩展. 二:扩展方法一览 1. AsEnumerable 2. Field 三:扩展类一览 DataTableExtensions 扩展 => public static EnumerableRowCollection<DataRo

HTML5 自定义属性 dataset

dataset 属性的 值是 DOMStringMap 的一个实例,也就是一个名值对儿的映射. 在这个映射中,每个 data-name 形式 的属性都会有一个对应的属性,只不过属性名没有 data-前缀 //本例中使用的方法仅用于演示 var div = document.getElementById("myDiv"); //取得自定义属性的值 var appId = div.dataset.appId; var myName = div.dataset.myname; //设置值 d

DataSet装换为泛型集合 222

#region DataSet装换为泛型集合 /// <summary> /// 利用反射和泛型 /// </summary> /// <param name="dt"></param> /// <returns></returns> public static List<T> ConvertToList<T>(DataTable dt) { // 定义集合 List<T> ts

LinQ To DataSet

LINQ to DataSet 是LINQ to ADO.NET 的一个独立技术.使用LINQ to DataSet能够更 快更容易的查询DataSet对象中的数据. LINQ to DataSet 功能主要通过DataRowExtensions和DataTableExtensions静态类 中的扩展方法来实现.LINQ to DataSet 不但能够对DataSet 对象中的单个表进行查询,而且还能够通过联接操作对DataSet对象中的多个表进行查询. DataTableExtensions类

Spark RDD、DataFrame和DataSet的区别

版权声明:本文为博主原创文章,未经博主允许不得转载. 目录(?)[+] 转载请标明出处:小帆的帆的专栏 RDD 优点: 编译时类型安全 编译时就能检查出类型错误 面向对象的编程风格 直接通过类名点的方式来操作数据 缺点: 序列化和反序列化的性能开销 无论是集群间的通信, 还是IO操作都需要对对象的结构和数据进行序列化和反序列化. GC的性能开销 频繁的创建和销毁对象, 势必会增加GC import org.apache.spark.sql.SQLContext import org.apache