[转载] ORMs under the hood / 憋错料

原文: http://www.vertabelo.com/blog/technical-articles/orms-under-the-hood

It often happens that if something is loved, it is also hated with the same power. The idea of object relational mapping fits into this concept perfectly. You will definitely come across many opposite points and fierce discussions as well as unwavering advocates and haters. So if you have ever asked whether to use ORM or not, the answer “it depends” will not be enough.

What is an object-relational mapping (ORM)?

Object-relational mapping refers to synchronization between two different representations of data. From one side, there is a relational database, where the data is represented in terms of tuples, grouped into relations, while in application we manipulate objects.

Most contemporary applications are built in object oriented technologies like Java or C# and use mostly relational databases to store data. Since it is definitely more convenient when data retrieved from the database can be accessed in the form of objects, there are also many implementations of ORM for many languages and frameworks (current list is quite huge ).

There is a big controversy connected to this topic which forces us to ask: What exactly are the benefits of an ORM and what are its detriments? Wading through different blog posts and discussions I will start from the opinion of Megan Bowra-Dean (@megahbite):

They’re great 99% of the time in MVC frameworks but that other 1% causes a lot of headaches if you don’t know SQL.

It’s said that ORM allows a developer to completely separate themselves from the database and almost forget about it (apart from situations when database changes need to be performed). Indeed a large number of developers are not very passionate about SQL and database concepts, and they live with the illusion that ORM will handle the job for them. Consequently the database sinks into oblivion till some problem appears. Neglecting the database while using an ORM is a big mistake and causes many problems.

Why SQL is so important while using an ORM?

Generally, there are two approaches that are followed. The first one is “Domain model centric” in which the code drives database design, while the second one, “Relational model centric”, places database design as the first step followed by the code. Both approaches have their pros and cons, but the second one is earning more advocates, mostly due to the database-first approach and requirements that are impossible to avoid like knowing about the principles of relational database and SQL.

So, what are the most common dangers associated with using ORM and not knowing SQL?

Danger 1: Not knowing database data types. This usually involves using too-general data types like ‘text’ type in PostgreSQL for most of the columns. This type is introduced to store strings of unlimited length what is identified with CLOB type. Multiple columns with this type cause an enormous performance challenge.

Danger 2: Not knowing constraints. This approach is quite often seen in many database designs. As joins become the most vulnerable case, in this approach they are overused which is extremely inefficient.

Danger 3: Not knowing basic database concepts : for example, not knowing about indexes. Without these smart structures, retrieving data becomes a performance nightmare. A system with data stored in a database with no indexes is a sign of imminent death for the application.

These three points are some of the many things the beginner developer need to know. A strong enthusiast of SQL and the creator of jOOQ (SQL library for Java) – Lukas Eder (@lukaseder), confirms a painful truth that many developers don’t know SQL:

We’re just not exposed to SQL nowadays. But consider this: We developers (or our customers) are paying millions of dollars every year to Oracle, Microsoft, IBM, SAP for their excellent RDBMS, only to ignore 90% of their database features and to perform a little bit of CRUD and a little bit of ACID with ORMs like Hibernate. We have forgotten about why those RDBMS were so expensive in the first place.
Source: “NoSQL? No, SQL! – How to Calculate Running Totals”

Use only SQL or stay with ORM?

The thesis of the article “What ORMs have taught me: just learn SQL” by Geoff Wozniak:

Attribute creep and excessive use of foreign keys shows me that in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.

This article caused an avalanche of replies and contradictory discussions. Most of them were focusing on abandoning ORM and instead using only SQL or staying with it and suffering from learning both the ORM and SQL. It still appears to be an unresolved problem.

One of the first-class benefits of ORM is its ease-of-use and the fact that it requires fewer lines of code. It definitely makes development quicker but only in case you’re familiar with a particular ORM. For others, ORM is a waste of time spent resolving persistence related issues. Not knowing it imposes some learning time and cumbersome debugging as a consequence of lack of experience. Indeed, with an ORM, you have no control over what is happening “under the hood.” Sometimes there are many configurations that are difficult to manage efficiently, many weird behaviors and in many cases problems with generating an optimized SQL query with an ORM. And here a common, popular story gets repeated: an ORM presents nice development initially and then there are problems when tracking ORM related bugs and inefficiencies. The statement of fellow ORM developer Gavin King – the creator of Hibernate rings true.

At this point I could come to the conclusion that ORM is not a universal tool and that all the haters are justified, because ORMs were designed this way on purpose, like the concept of ‘leaky abstraction.’ In his comments on Reddit, Gavin King explains this concept as:

the leakiness of the ORM abstraction is a feature, not a bug. It’s meant to be that way, because both object model and the relational model are valid ways of looking at the data, and both are useful

systems like Hibernate are intentionally designed as leaky abstractions so that it’s possible to easily mix in native SQL where necessary

Not only Gavin, but many others that have problems with ORMs say that it is connected to not enough knowledge about SQL, relational databases and the relational model instead of the way that ORM works with SQL. The commonly cited example of a developer with “a single object” with 600 attributes and 14 joins perfectly proves this phenomenon.

How does ORM work?

The main postulate that characterizes ORM is that it encapsulates database interaction inside an object. One part of the object keeps the data and the second one knows how to deal with the data and transfers it to the relational database (this functionality is implemented inside the ORM engine).

ORMs design patterns

According to Martin Fowler (@martinfowler), there are two patterns that were adopted in different object relational mappers.

The first one is Active Record and it is “An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” It means that a record on a table in a relational database is represented as an object.

Source: http://www.martinfowler.com/eaaCatalog/activeRecord.html

The second one is a Data Mapper.

According to the official definition, it is “A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.” That means that there is a special layer that separates in-memory objects from the database and its main responsibility is to transfer data between those two layers.

Source: http://martinfowler.com/eaaCatalog/dataMapper.html

There are a lot discussions about the supremacy of one over the other. Generally speaking, the Data Mapper pattern lacks the efficiencies when it comes to CRUD operations. In comparison to the Active Record pattern, the representation of an object is not necessarily a record in the database. The strongest feature of the Data Mapper design is the one API for various data stores (relational and NO SQL databases: external data stores accessed through REST, file based data storage or schema free databases).

Popular ORMs

Here is the list of some popular ORMs for the three leading programming languages: Python, Java and PHP.

ORMs using Active Record pattern:

Python		Java		PHP
Django ORM Storm peewee Pony ORM SQLObject		jOOQ MyBatis TopLink OpenJPA Torque		Propel Eloquent CakePHP PHPixie Yii

ORMs using Data Mapper pattern:

Python		Java		PHP
SQLAlchemy		Hibernate		Doctrine

In brief: advantages and disadvantages of using ORM

Let’s gather together the most important points for and against ORM.

FOR:

let the developer think in terms of objects rather than tables
no need to write SQL code
many advanced features like lazy loading
database independent: No need to write code specific to a particular database
reduces code and allows developers to focus on their business logic, rather than complex database queries
various ORMs provide a rich query interface

AGAINST:

complex (because they handle a bidirectional mapping). Their complexity implies a grueling learning curve – they have a special query language which developers have to learn
provides only a leaky abstraction over a relational data store
usually systems using an ORM perform badly (due to naive interactions with the underlying database)
ORM, by adding a layer of abstraction, speeds up the development but adds overhead to the application

Depending on your problem domain, the cons may outweigh the pros and so you might just want to stick with straight SQL. It’s important to know when to use and when not to use ORM.

SAY YES TO ORM:

it is intended for OLTP applications

SAY NO TO ORM:

not intended for batch processing
not recommended where there is a need to perform huge analysis

ORMs by example

Considering the below simple model, let’s see how some example usage looks in top ORMs in Python, Java and PHP:

↑ Click on a logo to preview the model in Vertabelo | Download the model as a png file

The ddl for this example looks as follows:

-- tables

-- Table: client

CREATE TABLE client (

id int NOT NULL,

full_name varchar(255) NOT NULL,

email varchar(255) NOT NULL,

CONSTRAINT client_pk PRIMARY KEY (id)

);

-- Table: purchase

CREATE TABLE purchase (

id int NOT NULL,

purchase_no char(12) NOT NULL,

client_id int NOT NULL,

CONSTRAINT purchase_pk PRIMARY KEY (id)

);

-- foreign keys

-- Reference: client_purchase (table: purchase)

ALTER TABLE purchase ADD CONSTRAINT client_purchase

FOREIGN KEY (client_id)

REFERENCES client (id)

NOT DEFERRABLE

INITIALLY IMMEDIATE

;

Python

Let’s start from one of the most-used Python ORM – SQLAlchemy. This framework provides an ORM with an adopted Data Mapper pattern. It contains two parts. The first one is ORM, while the second one is the Core. The Core is a fully featured SQL abstraction toolkit with SQL Expression Language that enable you to write SQL statements via generative Python expressions. Advanced ORM features like unit of work (confines all the database manipulation code to a specific database session that controls the life cycles of every object in that session. Session here is a beefed up version of database transaction), in-memory collections, eager loading of collections via joins and secondary subselects, and other optimizations allow SQLAlchemy’s ORM to emit efficient queries, select rows from not only tables, but also joins and other select statements and compose it to various Python structures.

In order to map tables to Python classes we have to define subclasses of declarative_base(). If you are using Vertabelo as a database modeling tool, you can provide some automation and generate it using open-source script hosted on Github, that takes as an argument the model structure in Vertabelo XML (you can generate it by clicking on XML button in the application). Follow full tutorial here or you can simply write the classes by hand.

The model looks as follows:

import sqlalchemy

from sqlalchemy import create_engine

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy import Column, Integer, BigInteger,String, ForeignKey, Unicode, Binary, LargeBinary, Time, DateTime, Date, Text, Boolean

from sqlalchemy.orm import relationship, backref, deferred

from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class Purchase (Base):

__tablename__ = "purchase"

id = Column(‘id‘, Integer, primary_key = True)

purchase_no = Column(‘purchase_no‘, Unicode)

client_id = Column(‘client_id‘, Integer, ForeignKey(‘client.id‘))

client = relationship(‘Client‘, foreign_keys=client_id)

class Client (Base):

__tablename__ = "client"

id = Column(‘id‘, Integer, primary_key = True)

full_name = Column(‘full_name‘, Unicode)

email = Column(‘email‘, Unicode)

To carry out a small investigation of this ORM we will create tables in SQLite database from the generated Python classes and try some Python expressions to perform some insert.

Run the interactive Python console and create a database engine for our database session.

>>> from sqlalchemy import create_engine

>>> engine = create_engine(‘sqlite:///‘)

Import classes from generated output.py file

>>> from output import *

Create a database session and database tables for classes defined as Client and Purchase. The lines provided below will create a session object which will be bound to the sqlite database. All of the database manipulation code and objects will be attached to a database session.

>>> from sqlalchemy.orm import sessionmaker

>>> session = sessionmaker()

>>> session.configure(bind=engine)

>>> Base.metadata.create_all(engine)

Any changes performed on a database are done in the context of a session. Multiple changes can be grouped in a session via session.add(object). To register all the changes in a database, issue a statementdb.session.commit().

Inserting data can be performed in three steps.

Create a session object

Create a Python object of Client and Purchase classes
Add it to the session
Commit the session

>>> s = session()

>>> c = Client(full_name=’client1’, email=’[email protected]’)

>>> p = Purchase(purchase_no=u’abcd123’, client_id=c.id)

>>> s.add(c)

>>> s.add(p)

>>> s.commit()

To retrieve the database objects, we call query() and filter() methods from the databasesession object.

>>> c = s.query(Client).filter(Client.full_name == ‘client1‘).one()

>>> c.full_name

u‘client1‘

>>> c.email

u‘[email protected]‘

>>> p = s.query(Purchase).filter(Purchase.client_id == c.id).one()

>>> p.purchase_no

u‘abcd123‘

Python frameworks to use with SQLAlchemy:

Java

jOOQ – is a database-mapping tool that implements the Active Record pattern, and is commonly considered a “happy medium between ORMs and JDBC”, because it embraces SQL rather than trying to hide it. jOOQ focuses on a relational model centric approach and assumes that database design should drive Java code. One of the features is possibility to construct queries that has similar structure to SQL.

Via jOOQ you can generate Java classes that model tables, records, sequences, stored procedures and many more. There are three ways to generate the jOOQ classes. You can generate jOOQ classse from the existing database or using Vertabelo-jOOQ integration. Having already prepared the database structure in Vertabeloyou can export it as a Vertabelo XML file and add as a property in the codegen configuration file (Full tutorialhere) or you can download the model directly from the cloud by providing the Vertabelo API token (model identifier) to the codegen file (full tutorial here). More information can be found here.

For example, an insert for our client-purchase example can be visualized in the code as follows:

// Create a new record

ClientRecord client = create.newRecord(CLIENT);

// Insert the record: INSERT INTO CLIENT VALUES (‘John’, ‘[email protected]’)

client.setFullName(‘John’);

client.setEmail(‘john@gmail.com’);

client.store();

Or we can loop over the clients returned from SELECT statements and delete those whose full_name is ‘John’.

for (PurchaseRecord purchase : create .selectFrom(PURCHASE)

.where(PURCHASE.CLIENT_ID.in(

select(CLIENT.ID)

.from(CLIENT)

.where(CLIENT.FULL_NAME.equal("John"))

)) {

purchase.delete();

}

PHP

Propel – is a PHP ORM with implemented Active Record pattern. It describes the model structure (tables, columns, relationships) in an XML file called the schema and generates classes based on the schema definition of tables.

The schema.xml can be generated by using a vertabeloPropel script. For more details follow this tutorial.

After generation, the structure of the database schema looks like the following:

<?xml version="1.0"?>

<database name="my_db" defaultIdMethod="native">

<table name="purchase">

<column name="id" type="INTEGER" sqlType="int" required="true" primaryKey="true"/>

<column name="purchase_no" type="CHAR" size="12" sqlType="char(12)" required="true"/>

<column name="client_id" type="INTEGER" sqlType="int" required="true"/>

<foreign-key foreignTable="client" name="client_purchase" onDelete="none" onUpdate="none">

<reference foreign="id" local="client_id"/>

</foreign-key>

</table>

<table name="client">

<column name="id" type="INTEGER" sqlType="int" required="true" primaryKey="true"/>

<column name="full_name" type="VARCHAR" size="255" sqlType="varchar(255)" required="true"/>

<column name="email" type="VARCHAR" size="255" sqlType="varchar(255)" required="true"/>

</table>

</database>

Having described the structure, we can generate the SQL code, by issuing:

$ propel sql:build

and then generate model classes, by running:

$ propel model:build

Adding new data in Propel can be performed in few steps

Create a Propel object
Define a value for each column by using setXXX() method
Call save() method.

$client = new Client();

$client->setFullName(‘client1‘);

$client->setEmail(‘[email protected]‘);

$client->save();

To read a row from a database, we use generated Query object and generated findPK() method. Reading object properties is done via calling the getter on the column.

$firstClient = ClientQuery::create()->findPK(1);

echo $firstClient->getFullName();

where line $firstClient = ClientQuery::create()->findPK(1); issues a simple SQL query.

SELECT client.id, client.full_name, client.email

FROM client

WHERE client.id = 1

LIMIT 1;

Yes, the problem of Object Relational Mapping is definitely a hard one. Most of the discussion still keeps revolving around the thoughts: “I would like to have something that lacks typical Hibernate or another ORM defects, but I don’t have a viable replacement. I am not planning to write all the code in pure SQL/JDBC, therefore I will stick to what I know.” The problem of Object Relational Mapping is still unresolved and discussions about it still become a battlefield. How do you handle the ORM problem? When do you say YES or NO to using ORM?

时间： 2024-07-29 22:24:33

[转载] ORMs under the hood

What is an object-relational mapping (ORM)?

Why SQL is so important while using an ORM?

Use only SQL or stay with ORM?

How does ORM work?

ORMs design patterns

Popular ORMs

In brief: advantages and disadvantages of using ORM

FOR:

AGAINST:

ORMs by example

Python

Java

PHP

[转载] ORMs under the hood的相关文章

【转载】LINUX 和 WINDOWS 内核的区别

python 常用模块（转载）

LightOj 1112 Curious Robin Hood（线段树||树状数组）

Lightoj 1112 - Curious Robin Hood 【单点修改 + 单点、区间查询】【树状数组水题】

【转载】C++拷贝构造函数(深拷贝，浅拷贝)

门控时钟-理论分析 ---- 转载

浅谈Java中的equals和==(转载)

JVM学习（2）——技术文章里常说的堆，栈，堆栈到底是什么，从os的角度总结--转载http://www.cnblogs.com/kubixuesheng/p/5202561.html

GitHub超详细图文攻略 - Git客户端下载安装 GitHub提交修改源码工作流程 Git分支标签过滤 Git版本工作流（转载）