如何写一个数据库How do you build a database?(转载)

转载自:http://www.reddit.com/r/Database/comments/27u6dy/how_do_you_build_a_database/ciggal8

Its a great question, and deserves a long answer.

Most database servers are built in C, and store data using B-tree type constructs. In the old days there was a product called C-Isam (c library for an indexed sequential access method) which is a low level library to help C programmers write data in B-tree format. So you need to know about btrees and understand what these are.

Most databases store data separate to indexes. Lets assume a record (or row) is 800 bytes long and you write 5 rows of data to a file. If the row contains columns such as first name, last name, address etc. and you want to search for a specific record by last name, you can open the file and sequentially search through each record but this is very slow. Instead you open an index file which just contains the lastname and the position of the record in the data file. Then when you have the position you open the data file, lseek to that position and read the data. Because index data is very small it is much quicker to search through index files. Also as the index files are stored in btrees in it very quick to effectively do a quicksearch (divide and conquer) to find the record you are looking for.

So you understand for one "table" you will have a data file with the data and one (or many) index files. The first index file could be for lastname, the next could be to search by SS number etc. When the user defines their query to get some data, they decide which index file to search through. If you can find any info on C-ISAM (there used to be an open source version (or cheap commercial) called D-ISAM) you will understand this concept quite well.

Once you have stored data and have index files, using an ISAM type approach allows you to GET a record based on a value, or PUT a new record. However modern database servers all support SQL, so you need an SQL parser that translates the SQL statement into a sequence of related GETs. SQL may join 2 tables so an optimizer is also needed to decide which table to read first (normally based on number of rows in each table and indexes available) and how to relate it to the next table. SQL can INSERT data so you need to parse that into PUT statements but it can also combine multiple INSERTS into transactions so you need a transaction manager to control this, and you will need transaction logs to store wip/completed transactions.

It is possible you will need some backup/restore commands to backup your data files and index files and maybe also your transaction log files, and if you really want to go for it you could write some replication tools to read your transaction log and replicate the transactions to a backup database on a different server. Note if you want your client programs (for example an SQL UI like phpmyadmin) to reside on separate machine than your database server you will need to write a connection manager that sends the SQL requests over TCP/IP to your server, then authenticate it using some credentials, parse the request, run your GETS and send back the data to the client.

So these database servers can be a lot of work, especially for one person. But you can create simple versions of these tools one at a time. Start with how to store data and indexes, and how to retrieve data using an ISAM type interface.

There are books out there - look for older books on mysql and msql, look for anything on google re btrees and isam, look for open source C libraries that already do isam. Get a good understanding on file IO on a linux machine using C. Many commercial databases now dont even use the filesystem for their data files because of cacheing issues - they write directly to raw disk. You want to just write to files initially.

I hope this helps a little bit.

廖雪峰的解释:转载自http://www.ruanyifeng.com/blog/2014/07/database_implementation.html

如何写一个数据库How do you build a database?(转载),布布扣,bubuko.com

时间: 2024-10-11 21:01:32

如何写一个数据库How do you build a database?(转载)的相关文章

跟我一起用Symfony写一个博客网站;

第一步: composer create-project symfony/framework-standard-edition 你的项目名: 创建完这个原型,我执行php bin/console server:run,可以跑起来: 那么此刻你需要连接数据库了:我的数据库是PostgreSql 写一个数据库创建脚本例如我的 create user myblog with password 'myblog' ; ALTER USER myblog WITH PASSWORD 'myblog'; cr

用 Python 写一个 NoSQL 数据库

本文译自 What is a NoSQL Database? Learn By Writing One In Python. 完整的示例代码已经放到了 GitHub 上, 请 点击这里, 这仅是一个极简的 demo, 旨在动手了解概念. 如果对译文有任何的意见或建议,欢迎 提 issue 讨论, 批评指正. 后续如有更新,可见 博客 . NoSQL 这个词在近些年正变得随处可见. 但是到底 "NoSQL" 指的是什么? 它是如何并且为什么这么有用? 在本文, 我们将会通过纯 Pytho

如何写一个能在gulp build pipe中任意更改src内容的函数

gulp在前端自动化构建中非常好用,有非常丰富的可以直接拿来使用的plugin,完成我们日常构建工作. 但是万事没有十全十美能够完全满足自己的需求,这时我们就要自己动手写一个小的函数,用于在gulp stream pipeline中执行我们想要的动作,比如我有一个需求在build后将gulp-inject插入的assets url修改为laravel的一个helper以便识别不同的运行环境:如果是staging环境则不要上cdn方便调试,如果是生产环境则将url修改为cdn的url,实现网站快速

用 Python 写一个 NoSQL 数据库Python

NoSQL 这个词在近些年正变得随处可见. 但是到底 "NoSQL" 指的是什么? 它是如何并且为什么这么有用? 在本文, 我们将会通过纯 Python (我比较喜欢叫它, "轻结构化的伪代码") 写一个 NoSQL 数据库来回答这些问题. OldSQL 很多情况下, SQL 已经成为 "数据库" (database) 的一个同义词. 实际上, SQL 是 Strctured Query Language 的首字母缩写, 而并非指数据库技术本身.

mysql5.7基础 切换到现有的一个数据库 use database命令 不用写 分号

镇场文:       学儒家经世致用,行佛家普度众生,修道家全生保真,悟易理象数通变.以科技光耀善法,成就一良心博客.______________________________________________________________________________________________________ Operating System:UbuntuKylin 16.04 LTS 64bitmysql: Ver 14.14 Distrib 5.7.17, for Linux (

独自handle一个数据库大程有感

这学期数据库课程,最后的大程是写一个MiniSQL的数据库实现,要求很简单,建删表,建删单值索引,支持主键和unique定义,支持最简单的select,只要支持3个类型:int,float,char(0~255).最开始,考虑到数据库的运行时确定类型的特点,选择了运行时强大的C#,还能顺便集成进Linq.但是一周后发现C#操作对象二进制结构的能力几乎为0,在写BufferManager的时候也发现完全不能自由的控制对象生命周期,并且IDisposable的实现也过于迷,与强大运行时和linq相比

一个菜鸟正在用SSH写一个论坛(1)

嗯..搞定了注册和登录,说明我的SSH整合已经没有问题了,那么我就继续折腾了. 我的目的是用SSH框架写一个论坛(当然是功能最简单的那种),搞定了整合之后我打算先做出一些基本的功能,于是我就先简单的设计了一下数据库. 1 /*==============================================================*/ 2 /* DBMS name: MySQL 5.0 */ 3 /* Created on: 2015/9/17 17:21:07 */ 4 /*

linux设备驱动第三篇:写一个简单的字符设备驱动

在linux设备驱动第一篇:设备驱动程序简介中简单介绍了字符驱动,本篇简单介绍如何写一个简单的字符设备驱动.本篇借鉴LDD中的源码,实现一个与硬件设备无关的字符设备驱动,仅仅操作从内核中分配的一些内存. 下面就开始学习如何写一个简单的字符设备驱动.首先我们来分解一下字符设备驱动都有那些结构或者方法组成,也就是说实现一个可以使用的字符设备驱动我们必须做些什么工作. 1.主设备号和次设备号 对于字符设备的访问是通过文件系统中的设备名称进行的.他们通常位于/dev目录下.如下: [plain] vie

为LoadRunner写一个lr_save_float函数

LoadRunner中有lr_save_int() 和lr_save_string() 函数,但是没有保存浮点数到变量的lr_save_float函数.<lr_save_float() function for LoadRunner>这篇文章介绍了如何写一个这样的函数: http://ptfrontline.wordpress.com/2010/01/27/lr_save_float-function-for-loadrunner/ void lr_save_float(const float