What is Table-Driven and Data-Driven Programming?
Data/Table-Driven programming is the technique of factoring repetitious programming constructs into data and a transformation pattern. This new data is often referred to by purists as meta-data when used in this fashion.
Data Driven and Table-Driven Programming Examples
Data-Driven Programming is a style that has existed for as long as programming. Programmers use data-driven techniques all the time without giving much thought to it. Here is a very simple example. Let‘s say you wish to write a function that will return the tax for a particular item given the geographic state and price. Your function may look something like this in PHP
/**Naive approach **/ function returntax($usstate, $price){ $result = 0; switch (strtoupper($usstate)){ case ‘MA‘: $result = 0.05*$price; break; case ‘NJ‘: $result = 0.07*$price; break; case ....//ad nauseum } return $result; }
If you look at this closely, you would observe the pattern of a*b where a is some function of state and b is the price. If you were to take the data-driven approach, then you would write the function as
/**Data-Driven **/
function returntax($usstate, $price){ $result = 0; $aryStateTax = array(‘MA‘=>0.05, ‘NJ‘=>0.07, ...); if (array_key_exists(strtoupper($usstate), $aryStateTax)) { $result = $aryStateTax[strtoupper($usstate)]*$price; } return $result; }
/**Table-Driven - if we want to put the power of logic in hands of users who can edit the state table**/
function returntax($usstate, $price){
global $db;
$result = 0;
$rs = $db->Execute(‘SELECT tax FROM lustates WHERE state =‘
. $db->qstr(strtoupper($usstate)));
if (!$rs->EOF) {
$result = $rs->fields(‘tax‘)*$price;
}
$rs->Close();
return $result;
}
The above is a fairly trivial example of Table/Data Driven programming. In the next couple of sections, I shall demonstrate more powerful constructs of the general approach.
Using Data to Generate SQL
One neat trick is that of using data to generate code. For example say you wanted to write a cross-tab query and you didn‘t want to hand-code it as you often have to in most databases.
Cross Tab Query
Suppose you wanted to create a cross tab query. In databases without native support for such constructs, you would often do something like the below (NOTE: example below is SQL Server syntax).
--Naive repetitive painful way
SELECT proj_id, SUM(CASE WHEN proj_fund = ‘Fund A‘ THEN amt ELSE NULL END)
As [totFund A],
SUM(CASE WHEN proj_fund = ‘Fund B‘ THEN amt ELSE NULL END)
As [totFund B],
...ad nauseum, SUM(amt) As totalcost
FROM projectfunding
GROUP BY proj_id
Now if you were to pay attention to the repetitious pattern in the above, you may realize you can more quickly and more flexibly rewrite the above like the below by querying a fund list.
--Using SQL to generate SQL - Table Driven
DECLARE @sql varchar(3000)
SELECT @sql = ‘SELECT proj_id, ‘
SELECT @sql = @sql +
‘ SUM(CASE WHEN proj_fund = ‘‘‘ + fu_name + ‘‘‘ THEN amt ELSE NULL END)
As [tot‘ + fu_name + ‘], ‘
FROM lufunds where fund_category = ‘Federal‘ ORDER BY fu_name
SELECT @sql = @sql + ‘, SUM(amt) As totalcost FROM projectfunding GROUP BY proj_id‘
EXEC(@sql)
The above approach has several benefits over the first. One it makes our code a bit shorter thus possibly easier to read and our query dynamically changes with the records in lufunds. We can sort our columns with ease without having to touch the code and the naming of the columns is guaranteed to be consistent with the convention we choose. This means a general user can change the definition of the query without touching the code.
Here is a similar example using a PostgreSQL custom SUM aggregate for text and generate_series functions - that generates sql and ASP.NET markup for a monthly crosstab report.
On the other hand, it is a bit harder to read for a novice programmer and is more prone to run-time breakage should something silly be put in the fund lookup table.
A Rudimentary Code Generator
Another class of problems where data-driven programming comes in handy is in generation of business rule code, formulas and just general code-generation. For scripted languages such as PHP which has eval interpreter syntax `` (for PHP) or Javascript where you often generate this code on the fly, this approach comes in immensely handy because it allows you to codify the logic in a generic pseudo-code way and then apply certain transformation patterns to translate your logic into the programming environments needed. For example you may want your logic to work on 3 tiers - the client-side, the server-side middle-tier, and your core database tier.
- On the client-side you need something like Javascript for validation or to create highly portable pricing calculators
- On the server side (should your javascript validation fail) or should you need to make this logic available to other apps you need something like PHP, VB, Python etc,
- on the database side you need SQL for your triggers or to update batch sets of data.
On top of that, what if the rules are constantly changing and you have a financial analyst or engineer writing these rules who is intimately versed in financial and/or engineering models but not in general programming? How do you make your code dynamic such that it changes with the rules and such that your rules are understandable and editable by a financial analyst or other Non-Programmer?
How do you write the rules and have it translate to all areas where it is needed without having to tweak each side when you need the logic to change?
One approach is to write the core of your logic as data (pseudo-logic that can not stand on its own) that is stored and written in such a fashion that it is easy for the rule maker or analyst to understand, can be passed thru a black box function or set of functions that will translate it into executable code for the environments you need it in, and can be self-documenting.
Creating such a format and creating the black-box functions is tricky for an all-purpose situation, but if your set of rules uses a very limited vocabulary (a mini domain language) - it is much more doable. A high level view of your system would look something like this.
- Data A + VBTransformPattern -> VB Code with logical meaning C
- Data A + JSTransformPattern -> Javascript Code with logical meaning C
- Data A + SQLTransformPattern -> SQL with logical meaning C
- Data A + EnglishTransformPattern -> Self-documenting report to CFO with logical meaning C
Where Data A is the logic written in your mini domain language that fits in the place holders in your transformation patterns.
Data, Logic and Function - What‘s the difference?
It is interesting to note that some languages do not make much of a distinction between data and function and allow passing of pointers to functions just as easily as passing other data. In fact if you look at the underlying architecture of many languages say C++, VB, and several other OO languages, these are founded on what are called VTABLEs even though this subtlety is hidden from its regular use. VTABLES are nothing more than tables that basically hold pointers to functions. Languages like Smalltalk allow one to pass anonymous blocks of code in data structures and apply them to other data structures. So we see that data can change data by some sort of activation function and control logic itself can be stored and changed.
We see that some of the most powerful constructs in programming treat functions,logic, and pseudo-logic as data. So then if logic and functions are data are they a special class of data - meta-data? Perhaps it is the wrong question to ask. The more important questions to answer I think deal with what we are trying to optimize for our particular situation - speed, flexibility, brevity and clarity. The items we care about most will dictate how we partition our logic into the various buckets. Highly productive programmers are keenly aware of their desired results and trade-offs and are experts at finding just the right balance of rigid code and data to arrive at that end. This gives rise to a wide spectrum of uses such as simple lookup driven flow to more advanced uses like self-replicating machines that morph based on data environment changes.
Articles of Interest | |
---|---|
More generate_series tricks | Provides some uses of data-driven programming tricks using PostgreSQL generate_series function. |
Data-Driven Programming | Chapter from the Art of Unix programming. |
Make SQL Server generate SQL for you using SELECT literals | Example of how to auto generate SQL using table lists from system tables |
Table Oriented Programming | An example of managing control flow with tables |
Cunningham and Cunningham: Data-driven Programming | |
Code that Writes Code (or TSQL that writes ASP.NET) | Example of generating aspx code with SQL Server transact sql stored proc. |
Jon Udell: The social scripting continuum | CoScripter - an example of a social scripting system - that allows users to record,create english readable scripts, and share them as data that are then executed by FireFox. Coscripter syntax is an example of code as data that is self-documenting. |