Yet Another 10 Common Mistakes Java Developers Make When Writing SQL (You Won’t BELIEVE the Last One)--reference / 憋错料

(Sorry for that click-bait heading. Couldn’t resist ;-) )

We’re on a mission. To teach you SQL. But mostly, we want to teach you how
to appreciate SQL. You’ll love it!

Getting SQL right or wrong shouldn’t be about that You’re-Doing-It-Wrong?
attitude that can be encountered often when evangelists promote their object of
evangelism. Getting SQL right should be about the fun you’ll have once you do
get it right. The things you start appreciating when you notice that you can
easily replace 2000 lines of slow, hard-to-maintain, and ugly imperative (or
object-oriented) code with 300 lines of lean functional code (e.g. using Java
8), or even better, with 50 lines of SQL.

We’re glad to see that our blogging friends have started appreciating SQL,
and most specifically, window functions after reading our posts. For instance,
take

Vlad
Mihalea’s Time to Break Free from the SQL-92 Mindset

Petri
Kainulainen’s revelations that lead to him starting his jOOQ
tutorial series (among other reasons)

Eugen Paraschiv
(from Baeldung)’s cracking up about Es-Queue-El

So, after our previous, very popular posts:

10
Common Mistakes Java Developers Make when Writing SQL

10
More Common Mistakes Java Developers Make when Writing SQL

… we’ll bring you:

Yet Another 10 Common Mistakes Java Developers Make When Writing SQL

And of course, this doesn’t apply to Java developers alone, but it’s written
from the perspective of a Java (and SQL) developer. So here we go (again):

1. Not Using Window Functions

After all that we’ve been preaching, this must be our number 1 mistake in
this series. Window
functions are probably the coolest SQL feature of them all. They’re so
incredibly useful, they should be the number one reason for anyone to switch to
a better database, e.g. PostgreSQL:

Mind bending talk by @lukaseder about @JavaOOQ at
tonight‘s @jsugtu. My new
resolution: Install PostgreSQL and study SQL standard at once.

— Peter
Kofler (@codecopkofler) April
7, 2014

If free and/or Open Source is important to you, you have absolutely no better
choice than using PostgreSQL (and you’ll even get to
use the free jOOQ Open Source Edition,
if you’re a Java developer).

And if you’re lucky enough to work in an environment with Oracle or SQL
Server (or DB2, Sybase) licenses, you get even more out of your new favourite
tool.

We won’t repeat all the window function goodness in this section, we’ve
blogged about them often enough:

Probably
the Coolest SQL Feature: Window Functions

NoSQL?
No, SQL! – How to Calculate Running Totals

How
can I do This? – With SQL of Course!

CUME_DIST(),
a Lesser-Known SQL Gem

Popular
ORMs Don’t do SQL

SQL
Trick: row_number() is to SELECT what dense_rank() is to SELECT
DISTINCT

ORM
vs. SQL, compared to C vs. ASM

The Cure:

Remove MySQL. Take a decent database. And start playing with window
functions. You’ll never go back, guaranteed.

2. Not declaring NOT NULL constraints

This one was already part of a previous list where we claimed that you should
add as much metadata as possible to your schema, because your database will be
able to leverage that metadata for optimisations. For instance, if your
database knows that a foreign key value
inBOOK.AUTHOR_ID must also be contained exactly
once in AUTHOR.ID, then a whole set of optimisations can be
achieved in complex queries.

Now let’s have another look at NOT NULL constraints.
If you’re using Oracle,NULL values will not be part of your
index. This doesn’t matter if you’re expressing
an IN constraint, for instance:

SELECT * FROM table

WHERE value IN (

SELECT
nullable_column FROM
...

)

But what happens with a NOT IN constraint?

SELECT * FROM table

WHERE value NOT IN (

SELECT
nullable_column FROM
...

)

Due to SQL’s
slightly unintuitive way of handling NULL, there is a
slight risk of the second query unexpectedly not returning any results at all,
namely if there is at least one NULL value as a result
from the subquery. This is true for all databases that get SQL right.

But because the index on nullable_column doesn’t
contain any NULLvalues, Oracle has to look up the complete
content in the table, resulting in a FULL TABLE SCAN.
Now that is unexpected! Details about this can be
seen in
this article.

The Cure:

Carefully review all your nullable, yet indexed columns, and check if you
really cannot add a NOT NULL constraint to those
columns.

The Tool:

If you’re using Oracle, use this query to detect all nullable, yet indexed
columns:

SELECT

i.table_name,

i.index_name,

LISTAGG(

LPAD(i.column_position, 2) || ‘: ‘
||

RPAD(i.column_name , 30) || ‘ ‘
||

DECODE(t.nullable, ‘Y‘, ‘(NULL)‘, ‘(NOT NULL)‘),

‘, ‘

) WITHIN GROUP
(ORDER BY i.column_position)

AS
"NULLABLE columns in indexes"

FROM user_ind_columns i

JOIN user_tab_cols t

ON (t.table_name, t.column_name) =

((i.table_name, i.column_name))

WHERE EXISTS (

SELECT
1

FROM
user_tab_cols t

WHERE
(t.table_name, t.column_name, t.nullable) =

((i.table_name, i.column_name, ‘Y‘
))

)

GROUP BY i.table_name, i.index_name

ORDER BY i.index_name ASC;

Example output:

TABLE_NAME | INDEX_NAME | NULLABLE columns in
indexes

-----------+--------------+----------------------------

PERSON | I_PERSON_DOB | 1: DATE_OF_BIRTH (NULL)

And then, fix it!

(Accidental criticism of Maven is irrelevant here ;-) )

If you’re curious about more details, see also these posts:

The
Index You’ve Added is Useless. Why?

NULL
in SQL. Explaining its Behaviour

Indexing NULL
in the Oracle Database

3. Using PL/SQL Package State

Now, this is a boring one if you’re not using Oracle, but if you are (and
you’re a Java developer), be very wary of PL/SQL package state. Are you really
doing what you think you’re doing?

Yes,
PL/SQL has package-state, e.g.

CREATE OR REPLACE
PACKAGE pkg IS

-- Package state here!

n NUMBER := 1;

FUNCTION
next_n RETURN
NUMBER;

END
pkg;

CREATE OR REPLACE
PACKAGE BODY pkg IS

FUNCTION
next_n RETURN
NUMBER

IS

BEGIN

n := n + 1;

RETURN
n;

END
next_n;

END pkg;

Wonderful, so you’ve created yourself an in-memory counter that generates a
new number every time you call pkg.next_n. But who owns that
counter? Yes, the session. Each session has their own initialised “package
instance”.

But no, it’s probably not the session you might have thought of.

We Java developers connect to databases through connection pools. When we
obtain a JDBC Connection from such a pool, we recycle that connection from a
previous “session”, e.g. a previous HTTP Request (not HTTP Session!). But that’s
not the same. The database session (probably) outlives the HTTP Request and will
be inherited by the next request, possibly from an entirely different user. Now,
imagine you had a credit card number in that package…?

Not The Cure:

Nope. Don’t just jump to using SERIALLY_REUSABLE packages

CREATE OR REPLACE
PACKAGE pkg IS

PRAGMA SERIALLY_REUSABLE;

n NUMBER := 1;

FUNCTION
next_n RETURN
NUMBER;

END
pkg;

Because:

You cannot even use that package from SQL, now (see ORA-06534).

Mixing this PRAGMA with regular package state from
other packages just makes things a lot more complex.

So, don’t.

Not The Cure:

I know. PL/SQL can be a beast. It often seems like such a quirky language.
But face it. Many things run much much faster when written in PL/SQL, so don’t
give up, just yet. Dropping PL/SQL is not the solution either.

The Cure:

At all costs, try to avoid package state in PL/SQL. Think of package state as
of static variables in Java. While they might be useful
for caches (and constants, of course) every now and then, you might not actually
access that state that you wanted. Think about load-balancers, suddenly
transferring you to another JVM. Think about class loaders, that might have
loaded the same class twice, for some reason.

Instead, pass state as arguments through procedures and functions. This will
avoid side-effects and make your code much cleaner and more predictable.

Or, obviuously, persist state to some table.

4. Running the same query all the time

Master data is boring. You probably wrote some utility to get the latest
version of your master data (e.g. language, locale, translations, tenant, system
settings), and you can query it every time, once it is available.

At all costs, don’t do that. You don’t have to cache many things in your
application, as modern databases have grown to be extremely fast when it comes
to caching:

Table / column content

Index content

Query / materialized view results

Procedure results (if they’re deterministic)

Cursors

Execution plans

So, for your average query, there’s virtually no need for an ORM second-level
cache, at least from a performance perspective (ORM caches mainly fulfil other
purposes, of course).

But when you query master data, i.e. data that never changes, then, network
latency, traffic and many other factors will impair your database
experience.

The Cure:

Please do take 10 minutes, download
Guava, and use its excellent
and easy to set up cache, that ships with various built-in invalidation
strategies. Choose time-based invalidation (i.e. polling), choose Oracle
AQ or Streams, or PostgreSQL’s NOTIFY for
event-based invalidation, or just make your cache permanent, if it doesn’t
matter. But don’t issue an identical master data query all
the time.

… This obviously brings us to

5. Not knowing about the N+1 problem

You had a choice. At the beginning of your software product, you had to
choose between:

An ORM (e.g. Hibernate, EclipseLink)

SQL (e.g. through JDBC, MyBatis, or jOOQ)

Both

So, obviously, you chose an ORM, because otherwise you wouldn’t be suffering
from “N+1″. What does “N+1″ mean?

The accepted answer on this
Stack Overflow question explains it nicely. Essentially, you’re running:

SELECT * FROM book

-- And then, for each book:

SELECT * FROM author WHERE
id = ?

Of course, you could go and tweak your hundreds of annotations to correctly
prefetch or eager fetch each book’s associated author information to produce
something along the lines of:

SELECT *

FROM
book

JOIN
author

ON
book.author_id = author.id

But that would be an awful lot of work, and you’ll risk eager-fetching too
many things that you didn’t want, resulting in another performance issue.

Maybe, you could upgrade to JPA 2.1 and use the new @NamedEntityGraph to
express beautiful annotation trees like this one:

@NamedEntityGraph(

name
= "post",

attributeNodes = {

@NamedAttributeNode("title"),

@NamedAttributeNode(

value = "comments",

subgraph = "comments"

)

},

subgraphs = {

@NamedSubgraph(

name
= "comments",

attributeNodes = {

@NamedAttributeNode("content")

}

)

}

)

The
example was taken from this blog post by Hantsy Bai. Hantsy then goes on
explaining that you can use the above beauty through the
following statement:

em.createQuery("select p from Post p where p.id=:id",

Post.class)

.setHint("javax.persistence.fetchgraph",

postGraph)

.setParameter("id", this.id)

.getResultList()

.get(0);

Let us all appreciate the above application of JEE standards with all due
respect, and then consider…

The Cure:

You just listen to the wise words at the beginning of this article and
replace thousands of lines of tedious Java / Annotatiomania? code
with a couple of lines of SQL. Because that will also likely help you prevent
another issue that we haven’t even touched yet, namely selecting too many
columns as you can see in these posts:

Our
previous listing of common mistakes

Myth:
SELECT * is bad

Since you’re already using an ORM, this might just mean resorting to native
SQL – or maybe you manage to express your query with JPQL. Of course, we agree
with Alessio Harri in believing that you should use jOOQ together with JPA:

Loved the type safety of @JavaOOQ today. OpenJPA is the
workhorse and @JavaOOQ is
the artist :) #80/20

— Alessio Harri (@alessioh) May
23, 2014

The Takeaway:

While the above will certainly help you work around some real world issues
that you may have with your favourite ORM, you could also take it one step
further and think about it this way. After all these years of pain and suffering
from the object-relational
impedance mismatch, the JPA 2.1 expert group is now trying to tweak their
way out of this annotation madness by adding more declarative, annotation-based
fetch graph hints to JPQL queries, that no one can debug, let alone
maintain.

The alternative is simple and straight-forward SQL. And with Java 8, we’ll
add functional transformation through the Streams API. That’s
hard to beat.

But obviuosly, your views and experiences on that subject may differ from
ours, so let’s head on to a more objective discussion about…

6. Not using Common Table Expressions

While common table expressions obviously offer readability improvements, they
may also offer performance improvements. Consider the following query that I
have recently encountered in a customer’s PL/SQL package (not the actual
query):

SELECT round (

(SELECT
amount FROM
payments WHERE
id = :p_id)

*

(

SELECT
e.bid

FROM
currencies c, exchange_rates e

WHERE
c.id =

(SELECT
cur_id FROM
payments WHERE
id = :p_id)

AND
e.cur_id =

(SELECT
cur_id FROM
payments WHERE
id = :p_id)

AND
e.org_id =

(SELECT
org_id FROM
payments WHERE
id = :p_id)

) / (

SELECT
c.factor

FROM
currencies c, exchange_rates e

WHERE
c.id =

(SELECT
cur_id FROM
payments WHERE
id = :p_id)

AND
e.cur_id =

(SELECT
cur_id FROM
payments WHERE
id = :p_id)

AND
e.org_id =

(SELECT
org_id FROM
payments WHERE
id = :p_id)

), 0

)

INTO amount

FROM dual;

So what does this do? This essentially converts a payment’s amount from one
currency into another. Let’s not delve into the business logic too much, let’s
head straight to the technical problem. The above query results in the following
execution plan (on Oracle):

------------------------------------------------------

| Operation | Name |

------------------------------------------------------

| SELECT STATEMENT | |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| NESTED LOOPS | |

| INDEX UNIQUE SCAN | CURR_PK |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| TABLE ACCESS BY INDEX ROWID | EXCHANGE_RATES |

| INDEX UNIQUE SCAN | EXCH_PK |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| NESTED LOOPS | |

| TABLE ACCESS BY INDEX ROWID | CURRENCIES |

| INDEX UNIQUE SCAN | CURR_PK |

| TABLE ACCESS BY INDEX ROWID| PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| INDEX UNIQUE SCAN | EXCH_PK |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| TABLE ACCESS BY INDEX ROWID | PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| FAST DUAL | |

------------------------------------------------------

The actual execution time is negligible in this case, but as you can see, the
same objects are accessed again and again within the query. This is a violation
of Common Mistake #4: Running the same query all the time.

The whole thing would be so much easier to read, maintain, and for Oracle to
execute, if we had used a common table expression. From the original source
code, observe the following thing:

-- We‘re always accessing a single payment:

FROM
payments WHERE
id = :p_id

-- Joining currencies and exchange_rates twice:

FROM
currencies c, exchange_rates e

So, let’s factor out the payment first:

-- "payment" contains only a single payment

-- But it contains all the columns that we‘ll need

-- afterwards

WITH payment AS
(

SELECT
cur_id, org_id, amount

FROM
payments

WHERE
id = :p_id

)

SELECT round(p.amount * e.bid / c.factor, 0)

-- Then, we simply don‘t need to repeat the

-- currencies / exchange_rates joins twice

FROM
payment p

JOIN
currencies c ON
p.cur_id = c.id

JOIN
exchange_rates e ON
e.cur_id = p.cur_id

AND
e.org_id = p.org_id

Note, that we’ve also replaced table lists with ANSI JOINs as
suggested in
our previous list

You wouldn’t believe it’s the same query, would you? And what about the
execution plan? Here it is!

---------------------------------------------------

| Operation | Name |

---------------------------------------------------

| SELECT STATEMENT | |

| NESTED LOOPS | |

| FAST DUAL | |

| TABLE ACCESS BY INDEX ROWID| PAYMENTS |

| INDEX UNIQUE SCAN | PAYM_PK |

| TABLE ACCESS BY INDEX ROWID | EXCHANGE_RATES |

| INDEX UNIQUE SCAN | EXCH_PK |

| TABLE ACCESS BY INDEX ROWID | CURRENCIES |

| INDEX UNIQUE SCAN | CURR_PK |

---------------------------------------------------

No doubt that this is much much better.

The Cure:

If you’re lucky enough and you’re using one of those databases that supports
window functions, chances are incredibly high (100%) that you also have common
table expression support. This is another reason for you to migrate from MySQL
to PostgreSQL, or appreciate the fact that you can work on an awesome commercial
database.

Common table expressions are like local variables in SQL. In every large
statement, you should consider using them, as soon as you feel that you’ve
written something before.

The Takeaway:

Some databases (e.g. PostgreSQL,
or SQL
Server) also support common table expressions for DML statements. In other
words, you can write:

1 2	`WITH` `...` `UPDATE` `...`

This makes DML incredibly more powerful.

7. Not using row value expressions for UPDATEs

We’ve
advertised the use of row value expressions in our previous listing. They’re
very readable and intuitive, and often also promote using certain indexes, e.g.
in PostgreSQL.

But few people know that they can also be used in
an UPDATE statement, in most databases. Check out the
following query, which I again found in a customer’s PL/SQL package (simplified
again, of course):

UPDATE u

SET
n = (SELECT
n + 1 FROM
t WHERE u.n = t.n),

s = (SELECT
‘x‘ || s FROM t WHERE u.n = t.n),

x = 3;

So this query takes a subquery as a data source for updating two columns, and
the third column is updated “regularly”. How does it perform? Moderately:

-----------------------------

| Operation | Name |

-----------------------------

| UPDATE STATEMENT | |

| UPDATE | U |

| TABLE ACCESS FULL| U |

| TABLE ACCESS FULL| T |

-----------------------------

Let’s ignore the full table scans, as this query is constructed. The actual
query could leverage indexes. But T is accessed twice,
i.e. in both subqueries. Oracle didn’t seem to be able to apply scalar
subquery cachingin this case.

To the rescue: row value expressions. Let’s simply rephrase
our UPDATE to this:

UPDATE u

SET
(n, s) = ((

SELECT
n + 1, ‘x‘ || s FROM t WHERE u.n = t.n

)),

x = 3;

Let’s ignore the funny, Oracle-specific double-parentheses syntax for the
right hand side of such a row value expression assignment, but let’s appreciate
the fact that we can easily assign a new value to the tuple (n, s) in one go! Note, we could have also written this,
instead, and assign x as well:

UPDATE u

SET
(n, s, x) = ((

SELECT
n + 1, ‘x‘ || s, 3

FROM
t WHERE u.n = t.n

));

As you will have expected, the execution plan has also improved,
and T is accessed only once:

-----------------------------

| Operation | Name
|

-----------------------------

| UPDATE
STATEMENT | |

| UPDATE
| U |

| TABLE
ACCESS FULL
| U |

| TABLE
ACCESS FULL
| T |

-----------------------------

The Cure:

Use row value expressions. Where ever you can. They make your SQL code
incredibly more expressive, and chances are, they make it faster, as well.

Note that the above is supported by jOOQ’s
UPDATE statement. This is the moment we would like to make you aware of this
cheap, in-article advertisement:

;-)

8. Using MySQL when you could use PostgreSQL

To some, this may appear to be a bit of a hipster discussion. But let’s
consider the facts:

MySQL claims to be the “most popular Open Source database”.

PostgreSQL claims to be the “most advanced Open Source database”.

Let’s consider a bit of history. MySQL has always been very easy to install,
maintain, and it has had a great and active community. This has lead to MySQL
still being the RDBMS of choice with virtually every web hoster on this planet.
Those hosters also host PHP, which was equally easy to install, and
maintain.

BUT!

We Java developers tend to have an opinion about PHP, right? It’s summarised
by this image here:

The PHP Hammer

Well, it works, but how does it work?

The same can be said about MySQL. MySQL has always worked
somehow, but while commercial databases like Oracle have made tremendous
progress both in terms of query optimisation and feature scope, MySQL has hardly
moved in the last decade.

Many people choose MySQL primarily because of its price (USD $ 0.00). But
often, the same people have found MySQL to be slow and quickly concluded that
SQL is slow per se – without evaluating the options. This is also why all NoSQL
stores compare themselves with MySQL, not with Oracle, the database that has
been winning the Transaction Processing
Performance Council’s (TPC) benchmarks almost forever. Some
examples:

Comparing
Cassandra, MongoDB, MySQL

Switching from MySQL
to Cassandra. Pros / Cons

MySQL
to Cassandra migrations

When to
use MongoDB rather than MySQL

While the last article bluntly adds “(and other RDBMS)” it
doesn’t go into any sort of detail whatsoever, what those “other
RDBMS” do wrong. It really only compares MongoDB with MySQL.

The Cure:

We say: Stop complaining about SQL, when in fact, you’re really complaining
about MySQL. There are at least four very popular databases out there that are
incredibly good, and millions of times better than MySQL. These are:

Oracle
Database

SQL Server

PostgreSQL

MS Access

(just kidding about the last one, of course)

The Takeaway:

Don’t fall for agressive NoSQL marketing. 10gen is
an extremely well-funded company,
even if MongoDB continues to disappoint, technically.

The
same is
true for Datastax.

Both companies are solving a problem that few people have. They’re selling us
niche products as commodity, making us think that
our real commodity databases (the RDBMS) no longer fulfil our
needs. They are well-funded and have big marketing teams to throw around with
blunt claims.

In the mean time, PostgreSQL just got even better, and
you, as a reader of this blog / post, are about to bet on the winning
team :-)

… just to cite Mark
Madsen once more:

History of NoSQL according to @markmadsen #strataconf pic.twitter.com/XHXMJsXHjV

— Edd Dumbill
(@edd) November 12,
2013

The Disclaimer:

This article has been quite strongly against MySQL. We don’t mean to talk
badly about a database that perfectly fulfils its purpose, as this isn’t a black
and white world. Heck, you can get happy with SQLite in some situations. MySQL,
being the cheap and easy to use, easy to install commodity database. We just
wanted to make you aware of the fact, that you’re

reference:http://java.dzone.com/articles/yet-another-10-common-mistakes

Yet Another 10 Common Mistakes Java Developers Make When Writing
SQL (You Won’t BELIEVE the Last One)--reference,布布扣,bubuko.com

Yet Another 10 Common Mistakes Java Developers Make When Writing
SQL (You Won’t BELIEVE the Last One)--reference

时间： 2024-10-05 04:44:52

Yet Another 10 Common Mistakes Java Developers Make When Writing SQL (You Won’t BELIEVE the Last One)--reference

Yet Another 10 Common Mistakes Java Developers Make When Writing SQL

1. Not Using Window Functions

2. Not declaring NOT NULL constraints

3. Using PL/SQL Package State

4. Running the same query all the time

5. Not knowing about the N+1 problem

6. Not using Common Table Expressions

7. Not using row value expressions for UPDATEs

8. Using MySQL when you could use PostgreSQL

Yet Another 10 Common Mistakes Java Developers Make When Writing SQL (You Won’t BELIEVE the Last One)--reference的相关文章

Top 10 Mistakes Java Developers Make--reference

Top 10 Mistakes Java Developers Make(转)

10 Easy Steps to a Complete Understanding of SQL

100 high quality blogs from java developers

Watch out for these 10 common pitfalls of experienced Java developers & architects--转

[ZZ]10 Most Common Mistakes that Python Programmers Make

[转]50 Shades of Go: Traps, Gotchas, and Common Mistakes for New Golang Devs

Top 10 questions about Java Collections--reference

Top 10 Methods for Java Arrays