[转载] ORMs under the hood

原文: http://www.vertabelo.com/blog/technical-articles/orms-under-the-hood

It often happens that if something is loved, it is also hated with the same power. The idea of object relational mapping fits into this concept perfectly. You will definitely come across many opposite points and fierce discussions as well as unwavering advocates and haters. So if you have ever asked whether to use ORM or not, the answer “it depends” will not be enough.

What is an object-relational mapping (ORM)?

Object-relational mapping refers to synchronization between two different representations of data. From one side, there is a relational database, where the data is represented in terms of tuples, grouped into relations, while in application we manipulate objects.

Most contemporary applications are built in object oriented technologies like Java or C# and use mostly relational databases to store data. Since it is definitely more convenient when data retrieved from the database can be accessed in the form of objects, there are also many implementations of ORM for many languages and frameworks (current list is quite huge ).

There is a big controversy connected to this topic which forces us to ask: What exactly are the benefits of an ORM and what are its detriments? Wading through different blog posts and discussions I will start from the opinion of Megan Bowra-Dean (@megahbite):

They’re great 99% of the time in MVC frameworks but that other 1% causes a lot of headaches if you don’t know SQL.

It’s said that ORM allows a developer to completely separate themselves from the database and almost forget about it (apart from situations when database changes need to be performed). Indeed a large number of developers are not very passionate about SQL and database concepts, and they live with the illusion that ORM will handle the job for them. Consequently the database sinks into oblivion till some problem appears. Neglecting the database while using an ORM is a big mistake and causes many problems.

Why SQL is so important while using an ORM?

Generally, there are two approaches that are followed. The first one is “Domain model centric” in which the code drives database design, while the second one, “Relational model centric”, places database design as the first step followed by the code. Both approaches have their pros and cons, but the second one is earning more advocates, mostly due to the database-first approach and requirements that are impossible to avoid like knowing about the principles of relational database and SQL.

So, what are the most common dangers associated with using ORM and not knowing SQL?

Danger 1: Not knowing database data types. This usually involves using too-general data types like ‘text’ type in PostgreSQL for most of the columns. This type is introduced to store strings of unlimited length what is identified with CLOB type. Multiple columns with this type cause an enormous performance challenge.

Danger 2: Not knowing constraints. This approach is quite often seen in many database designs. As joins become the most vulnerable case, in this approach they are overused which is extremely inefficient.

Danger 3: Not knowing basic database concepts : for example, not knowing about indexes. Without these smart structures, retrieving data becomes a performance nightmare. A system with data stored in a database with no indexes is a sign of imminent death for the application.

These three points are some of the many things the beginner developer need to know. A strong enthusiast of SQL and the creator of jOOQ (SQL library for Java) – Lukas Eder (@lukaseder), confirms a painful truth that many developers don’t know SQL:

We’re just not exposed to SQL nowadays. But consider this: We developers (or our customers) are paying millions of dollars every year to Oracle, Microsoft, IBM, SAP for their excellent RDBMS, only to ignore 90% of their database features and to perform a little bit of CRUD and a little bit of ACID with ORMs like Hibernate. We have forgotten about why those RDBMS were so expensive in the first place.
Source: “NoSQL? No, SQL! – How to Calculate Running Totals”

Use only SQL or stay with ORM?

The thesis of the article “What ORMs have taught me: just learn SQL” by Geoff Wozniak:

Attribute creep and excessive use of foreign keys shows me that in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.

This article caused an avalanche of replies and contradictory discussions. Most of them were focusing on abandoning ORM and instead using only SQL or staying with it and suffering from learning both the ORM and SQL. It still appears to be an unresolved problem.

One of the first-class benefits of ORM is its ease-of-use and the fact that it requires fewer lines of code. It definitely makes development quicker but only in case you’re familiar with a particular ORM. For others, ORM is a waste of time spent resolving persistence related issues. Not knowing it imposes some learning time and cumbersome debugging as a consequence of lack of experience. Indeed, with an ORM, you have no control over what is happening “under the hood.” Sometimes there are many configurations that are difficult to manage efficiently, many weird behaviors and in many cases problems with generating an optimized SQL query with an ORM. And here a common, popular story gets repeated: an ORM presents nice development initially and then there are problems when tracking ORM related bugs and inefficiencies. The statement of fellow ORM developer Gavin King – the creator of Hibernate rings true.

At this point I could come to the conclusion that ORM is not a universal tool and that all the haters are justified, because ORMs were designed this way on purpose, like the concept of ‘leaky abstraction.’ In his comments on Reddit, Gavin King explains this concept as:

the leakiness of the ORM abstraction is a feature, not a bug. It’s meant to be that way, because both object model and the relational model are valid ways of looking at the data, and both are useful

systems like Hibernate are intentionally designed as leaky abstractions so that it’s possible to easily mix in native SQL where necessary

Not only Gavin, but many others that have problems with ORMs say that it is connected to not enough knowledge about SQL, relational databases and the relational model instead of the way that ORM works with SQL. The commonly cited example of a developer with “a single object” with 600 attributes and 14 joins perfectly proves this phenomenon.

How does ORM work?

The main postulate that characterizes ORM is that it encapsulates database interaction inside an object. One part of the object keeps the data and the second one knows how to deal with the data and transfers it to the relational database (this functionality is implemented inside the ORM engine).

ORMs design patterns

According to Martin Fowler (@martinfowler), there are two patterns that were adopted in different object relational mappers.

The first one is Active Record and it is “An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” It means that a record on a table in a relational database is represented as an object.

Source: http://www.martinfowler.com/eaaCatalog/activeRecord.html

The second one is a Data Mapper.

According to the official definition, it is “A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.” That means that there is a special layer that separates in-memory objects from the database and its main responsibility is to transfer data between those two layers.

Source: http://martinfowler.com/eaaCatalog/dataMapper.html

There are a lot discussions about the supremacy of one over the other. Generally speaking, the Data Mapper pattern lacks the efficiencies when it comes to CRUD operations. In comparison to the Active Record pattern, the representation of an object is not necessarily a record in the database. The strongest feature of the Data Mapper design is the one API for various data stores (relational and NO SQL databases: external data stores accessed through REST, file based data storage or schema free databases).

Popular ORMs

Here is the list of some popular ORMs for the three leading programming languages: Python, Java and PHP.

ORMs using Active Record pattern:

Python   Java   PHP
   

ORMs using Data Mapper pattern:

Python   Java   PHP
   

In brief: advantages and disadvantages of using ORM

Let’s gather together the most important points for and against ORM.

FOR:

  • let the developer think in terms of objects rather than tables
  • no need to write SQL code
  • many advanced features like lazy loading
  • database independent: No need to write code specific to a particular database
  • reduces code and allows developers to focus on their business logic, rather than complex database queries
  • various ORMs provide a rich query interface

AGAINST:

  • complex (because they handle a bidirectional mapping). Their complexity implies a grueling learning curve – they have a special query language which developers have to learn
  • provides only a leaky abstraction over a relational data store
  • usually systems using an ORM perform badly (due to naive interactions with the underlying database)
  • ORM, by adding a layer of abstraction, speeds up the development but adds overhead to the application

Depending on your problem domain, the cons may outweigh the pros and so you might just want to stick with straight SQL. It’s important to know when to use and when not to use ORM.

SAY YES TO ORM:

  • it is intended for OLTP applications

SAY NO TO ORM:

  • not intended for batch processing
  • not recommended where there is a need to perform huge analysis

ORMs by example

Considering the below simple model, let’s see how some example usage looks in top ORMs in Python, Java and PHP:

↑ Click on a logo to preview the model in Vertabelo | Download the model as a png file

The ddl for this example looks as follows:

-- tables

-- Table: client

CREATE TABLE client (

    id int  NOT NULL,

    full_name varchar(255)  NOT NULL,

    email varchar(255)  NOT NULL,

    CONSTRAINT client_pk PRIMARY KEY (id)

);

-- Table: purchase

CREATE TABLE purchase (

    id int  NOT NULL,

    purchase_no char(12)  NOT NULL,

    client_id int  NOT NULL,

    CONSTRAINT purchase_pk PRIMARY KEY (id)

);

-- foreign keys

-- Reference:  client_purchase (table: purchase)

ALTER TABLE purchase ADD CONSTRAINT client_purchase

    FOREIGN KEY (client_id)

    REFERENCES client (id)

    NOT DEFERRABLE

    INITIALLY IMMEDIATE

;

Python

Let’s start from one of the most-used Python ORM – SQLAlchemy. This framework provides an ORM with an adopted Data Mapper pattern. It contains two parts. The first one is ORM, while the second one is the Core. The Core is a fully featured SQL abstraction toolkit with SQL Expression Language that enable you to write SQL statements via generative Python expressions. Advanced ORM features like unit of work (confines all the database manipulation code to a specific database session that controls the life cycles of every object in that session. Session here is a beefed up version of database transaction), in-memory collections, eager loading of collections via joins and secondary subselects, and other optimizations allow SQLAlchemy’s ORM to emit efficient queries, select rows from not only tables, but also joins and other select statements and compose it to various Python structures.

In order to map tables to Python classes we have to define subclasses of declarative_base(). If you are using Vertabelo as a database modeling tool, you can provide some automation and generate it using open-source script hosted on Github, that takes as an argument the model structure in Vertabelo XML (you can generate it by clicking on XML button in the application). Follow full tutorial here or you can simply write the classes by hand.

The model looks as follows:

import sqlalchemy

from sqlalchemy import create_engine

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy import Column, Integer, BigInteger,String, ForeignKey, Unicode, Binary, LargeBinary, Time, DateTime, Date, Text, Boolean

from sqlalchemy.orm import relationship, backref, deferred

from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class Purchase (Base):

    __tablename__ = "purchase"

    id = Column(‘id‘, Integer, primary_key = True)

    purchase_no = Column(‘purchase_no‘, Unicode)

    client_id = Column(‘client_id‘, Integer, ForeignKey(‘client.id‘))

    client = relationship(‘Client‘, foreign_keys=client_id)

class Client (Base):

    __tablename__ = "client"

    id = Column(‘id‘, Integer, primary_key = True)

    full_name = Column(‘full_name‘, Unicode)

    email = Column(‘email‘, Unicode)

To carry out a small investigation of this ORM we will create tables in SQLite database from the generated Python classes and try some Python expressions to perform some insert.

Run the interactive Python console and create a database engine for our database session.

>>> from sqlalchemy import create_engine

>>> engine = create_engine(‘sqlite:///‘)

Import classes from generated output.py file

>>> from output import *

Create a database session and database tables for classes defined as Client and Purchase. The lines provided below will create a session object which will be bound to the sqlite database. All of the database manipulation code and objects will be attached to a database session.

>>> from sqlalchemy.orm import sessionmaker

>>> session = sessionmaker()

>>> session.configure(bind=engine)

>>> Base.metadata.create_all(engine)

Any changes performed on a database are done in the context of a session. Multiple changes can be grouped in a session via session.add(object). To register all the changes in a database, issue a statementdb.session.commit().

Inserting data can be performed in three steps.

Create a session object

  1. Create a Python object of Client and Purchase classes
  2. Add it to the session
  3. Commit the session

>>> s = session()

>>> c = Client(full_name=’client1’, email=’[email protected]’)

>>> p = Purchase(purchase_no=u’abcd123’, client_id=c.id)

>>> s.add(c)

>>> s.add(p)

>>> s.commit()

To retrieve the database objects, we call query() and filter() methods from the databasesession object.

>>> c = s.query(Client).filter(Client.full_name == ‘client1‘).one()

>>>  c.full_name

 u‘client1‘

>>> c.email

 u‘[email protected]‘

>>> p = s.query(Purchase).filter(Purchase.client_id == c.id).one()

>>> p.purchase_no

 u‘abcd123‘

Python frameworks to use with SQLAlchemy:

Java

jOOQ – is a database-mapping tool that implements the Active Record pattern, and is commonly considered a “happy medium between ORMs and JDBC”, because it embraces SQL rather than trying to hide it. jOOQ focuses on a relational model centric approach and assumes that database design should drive Java code. One of the features is possibility to construct queries that has similar structure to SQL.

Via jOOQ you can generate Java classes that model tables, records, sequences, stored procedures and many more. There are three ways to generate the jOOQ classes. You can generate jOOQ classse from the existing database or using Vertabelo-jOOQ integration. Having already prepared the database structure in Vertabeloyou can export it as a Vertabelo XML file and add as a property in the codegen configuration file (Full tutorialhere) or you can download the model directly from the cloud by providing the Vertabelo API token (model identifier) to the codegen file (full tutorial here). More information can be found here.

For example, an insert for our client-purchase example can be visualized in the code as follows:

// Create a new record

ClientRecord client = create.newRecord(CLIENT);

// Insert the record: INSERT INTO CLIENT VALUES (‘John’, ‘[email protected]’)

client.setFullName(‘John’);

client.setEmail(‘john@gmail.com’);

client.store();

Or we can loop over the clients returned from SELECT statements and delete those whose full_name is ‘John’.

for (PurchaseRecord purchase : create .selectFrom(PURCHASE)

   .where(PURCHASE.CLIENT_ID.in(

       select(CLIENT.ID)

       .from(CLIENT)

       .where(CLIENT.FULL_NAME.equal("John"))

)) {

purchase.delete();

}

PHP

Propel – is a PHP ORM with implemented Active Record pattern. It describes the model structure (tables, columns, relationships) in an XML file called the schema and generates classes based on the schema definition of tables.

The schema.xml can be generated by using a vertabeloPropel script. For more details follow this tutorial.

After generation, the structure of the database schema looks like the following:

<?xml version="1.0"?>

<database name="my_db" defaultIdMethod="native">

  <table name="purchase">

    <column name="id" type="INTEGER" sqlType="int" required="true" primaryKey="true"/>

    <column name="purchase_no" type="CHAR" size="12" sqlType="char(12)" required="true"/>

    <column name="client_id" type="INTEGER" sqlType="int" required="true"/>

    <foreign-key foreignTable="client" name="client_purchase" onDelete="none" onUpdate="none">

      <reference foreign="id" local="client_id"/>

    </foreign-key>

  </table>

  <table name="client">

    <column name="id" type="INTEGER" sqlType="int" required="true" primaryKey="true"/>

    <column name="full_name" type="VARCHAR" size="255" sqlType="varchar(255)" required="true"/>

    <column name="email" type="VARCHAR" size="255" sqlType="varchar(255)" required="true"/>

  </table>

</database>

Having described the structure, we can generate the SQL code, by issuing:

$ propel sql:build

and then generate model classes, by running:

$ propel model:build

Adding new data in Propel can be performed in few steps

  1. Create a Propel object
  2. Define a value for each column by using setXXX() method
  3. Call save() method.

$client = new Client();

$client->setFullName(‘client1‘);

$client->setEmail(‘[email protected]‘);

$client->save();

To read a row from a database, we use generated Query object and generated findPK() method. Reading object properties is done via calling the getter on the column.

$firstClient = ClientQuery::create()->findPK(1);

echo $firstClient->getFullName();

where line $firstClient = ClientQuery::create()->findPK(1); issues a simple SQL query.

SELECT client.id, client.full_name, client.email

FROM client

WHERE client.id = 1

LIMIT 1;

Yes, the problem of Object Relational Mapping is definitely a hard one. Most of the discussion still keeps revolving around the thoughts: “I would like to have something that lacks typical Hibernate or another ORM defects, but I don’t have a viable replacement. I am not planning to write all the code in pure SQL/JDBC, therefore I will stick to what I know.” The problem of Object Relational Mapping is still unresolved and discussions about it still become a battlefield. How do you handle the ORM problem? When do you say YES or NO to using ORM?

时间: 2024-07-29 22:24:33

[转载] ORMs under the hood的相关文章

【转载】LINUX 和 WINDOWS 内核的区别

LINUX 和 WINDOWS 内核的区别 [声明:欢迎转载,转载请注明出自CU ACCESSORY http://linux.chinaunix.net/bbs/thread-1153868-1-1.html] 关于LINUX和WINDOWS的口水站已经很多了.本文企图从技术角度来比较下2个主流操作系统的异同.偏重于内核部分. 一.动机: 我最早是 WINDOWS 阵营的.在WINDOWS下写过2年多的驱动程序.后来由于学习需要,转投LINUX,一晃也快2年了.期间经历了很多曲折,也学到了很多

python 常用模块(转载)

转载地址:http://codeweblog.com/python-%e5%b8%b8%e7%94%a8%e6%a8%a1%e5%9d%97/ adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheetahcherrypy:一个WEB frameworkctypes:用来调用动态链接库DBUtils:数据库连接池django:一个WEB frameworkdocutils:用来写文档的dpkt:数据包的解包和组包My

LightOj 1112 Curious Robin Hood(线段树||树状数组)

Curious Robin Hood Robin Hood likes to loot rich people since he helps the poor people with this money. Instead of keeping all the money together he does another trick. He keeps n sacks where he keeps this money. The sacks are numbered from 0 to n-1.

Lightoj 1112 - Curious Robin Hood 【单点修改 + 单点、 区间查询】【树状数组 水题】

1112 - Curious Robin Hood PDF (English) Statistics Forum Time Limit: 1 second(s) Memory Limit: 64 MB Robin Hood likes to loot rich people since he helps the poor people with this money. Instead of keeping all the money together he does another trick.

【转载】C++拷贝构造函数(深拷贝,浅拷贝)

对于普通类型的对象来说,它们之间的复制是很简单的,例如:int a=88;int b=a; 而类对象与普通对象不同,类对象内部结构一般较为复杂,存在各种成员变量.下面看一个类对象拷贝的简单例子. #include <iostream>using namespace std;class CExample {private:     int a;public:     CExample(int b)     { a=b;}     void Show ()     {        cout<

门控时钟-理论分析 ---- 转载

转载自:http://www.chipsbank.com/news_detail/newsId=123.html 门控的基本要求: 1. 所需要的沿(对于正沿触发的寄存器是正沿,对于负沿触发的寄存器是负沿)不增加,不减少: 1. 不会产生毛刺: 1. 使用后功耗要能够降低: 1. 最好面积还会减小. 1. 上升沿触发的门控时钟的结构研究:应用与上升沿触发的寄存器的门控. 1. 直接与门结构: 1. 高电平使能Latch + 与门结构: 1. 低电平使能Latch + 与门结构: 1. 波形研究:

浅谈Java中的equals和==(转载)

在初学Java时,可能会经常碰到下面的代码: 1 String str1 = new String("hello"); 2 String str2 = new String("hello"); 3 4 System.out.println(str1==str2); 5 System.out.println(str1.equals(str2)); 为什么第4行和第5行的输出结果不一样?==和equals方法之间的区别是什么?如果在初学Java的时候这个问题不弄清楚,就

JVM学习(2)——技术文章里常说的堆,栈,堆栈到底是什么,从os的角度总结--转载http://www.cnblogs.com/kubixuesheng/p/5202561.html

转载自---http://www.cnblogs.com/kubixuesheng/p/5202561.html 俗话说,自己写的代码,6个月后也是别人的代码--复习!复习!复习!涉及到的知识点总结如下: 堆栈是栈 JVM栈和本地方法栈划分 Java中的堆,栈和c/c++中的堆,栈 数据结构层面的堆,栈 os层面的堆,栈 JVM的堆,栈和os如何对应 为啥方法的调用需要栈 属于月经问题了,正好碰上有人问我这类比较基础的知识,无奈我自觉回答不是有效果,现在深入浅出的总结下: 前一篇文章总结了:JV

GitHub超详细图文攻略 - Git客户端下载安装 GitHub提交修改源码工作流程 Git分支 标签 过滤 Git版本工作流(转载)

最近听同事说他都在使用GitHub,GitHub是程序员的社区,在里面可以学到很多书上学不到的东西,所以最近在准备入手这方面的知识去尝试学习,正好碰到这么详细完整的文章,就转载了,希望对自己和大家有帮助. GitHub操作总结 : 总结看不明白就看下面的详细讲解. GitHub操作流程 : 第一次提交 : 方案一 : 本地创建项目根目录, 然后与远程GitHub关联, 之后的操作一样; -- 初始化Git仓库 :git init ; -- 提交改变到缓存 :git commit -m 'desc