Notes towards a set-objective language.

Discovered while searching for something else on my computer is this jibber-jabber from 2009.

Draft 0.1
30 March 2009

The Problem.

The two most important programming paradigms today are Object-Oriented Programming and Relational Programming. The latter is often not considered ‘programming’ in the same sense as OOP is, being generally confined to a way of expressing data structure to a database.

Nevertheless, programmers commonly use one approach, or both, to define, express and solve client problems. It is very common indeed for part of the problem to be solved in a Relational way (using an SQL-speaking database) and part of it in an Object-Oriented way (using Java, C#, PHP, Python, Ruby …). This is so common that it is widely expressed in architectural patterns (n-tier programming) and design patterns (Model-View-Control). Typically the data is expressed relationally, with everything else being handled by code in an OOP language.

But all is not well in paradise. Object Oriented Programming is very different in scope, theme and form to Relational Programming. The mental models, languages, level of mathematical formality and strategies for runtimes and compilation are very different.

This is commonly called the “impedance mismatch” problem. To solve it, programmers have traditionally resorted to various layers or libraries to stand between the OOP code and attending RDBMSes. These are Object-Relational Managers (ORMs). So widespread and pervasive is the problem that this wheel is constantly being reinvented in slightly different ways for every OOP language. ORMs are a very large genre. There are, it can be said without too much exaggeration, hundreds of them, all slightly different but having common properties and aims. Any language will have up to dozens.

Why so many? Because none has been found to be clearly satisfactory.

Ted Nedward described ORMs as “The Vietnam of Computer Science” in a paper of the same name. He says that “it represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.”

In the paper Nedward explains that:

  • Any ORM favours one paradigm over the other, effectively gutting the advantages of one half of the solution code;
  • Developers mapping objects to tables lack the ability to cleanly express inheritance in relational terms (for example, queries on PROFESSOR objects should not call in PEOPLE objects);
  • The duplication of models leads to confusion about which body of code “owns” or drives the schema — OOP code or Relational code;
  • Sometimes the solution is some third schema from which the other two are derived, which simply adds more complexity and potential gaps in expressibility;
  • Concepts of Identity are different in OOP languages and Relational languages. In OOP identity refers to a single object, an address in memory. In relational terms it is a mathematically unique tuple/relationship, which is similar to an OOP concept of Equality. ORMs must decide on a policy as to which dominates;
  • Retrieving data introduces a raft of choices for allocating responsibility — how much of any query or computation belongs in the OOP code vs the Relational code? Different answers will have widely different performance characteristics — and an effect on schema ownership, duplication etc above;
  • Loading and saving time or synchronisation issues — Should loading be eager? Lazy? Should whole objects be loaded, or a field at a time? What about caching? When is the database updated?

Nedward outlines five possibilities:

  1. Abandoning OOP, developers exclusively express everything Relationally. Relational languages are not, by the character derived from their original purpose, well suited for this.
  2. Abandoning relational programming, developers exclusively express everything in OOP terms. In practice this is likely to be OODBMSes, but these suffer from many similar problems as ORMs.
  3. Manual mapping. For simple problems this might be acceptable, but it increases the likelihood of errors. It also pretends that software stops growing.
  4. Acceptance of ORM limitations. Assuming that ORMs can handle the common case well, the cost of dealing with the persnickety bits could be considered an acceptable tradeoff.
  5. Integrating relational concepts into languages. This is starting to happen with technologies like LINQ. This document outlines a much more radical attack from this direction.
  6. Integrating relational concepts into frameworks. Designing OOP frameworks that express problems in relational terms. This is poor because it gives up OOP ideas for Relational ones.

What is the cause of the mismatch?

I think that the key cause of mismatch is the differences in Equality and Identity between OO and Relational Programming.

Object-Oriented Programming grew from Imperative Programming. In many ways it is Imperative — OO programs are sequential instructions acting on mutable state. They are packaged, grouped, instantiated, abstracted and addressed in various ways, but the heart of any OO program is the state and sequential instructions.

Object Oriented Programming inherits a memory-centric approach to computation. It’s not as obvious as it is to the programmer using assembler or C, but a Java object is a memory address. Just because we do not explicitly give it a mutable numeric value, it does not lose that nature. All OOP languages have “reference variables” which at their core act as abstracting pointers to memory addresses where objects live.

In general the OO programmer doesn’t care where the object lives. He or she is, in most OO languages, not interested in managing memory (except in C++). That task is left to the runtime. Indeed most OO languages simply remove the ability to manually manage memory in any way. The runtime takes the task on completely, automatically allocating memory, performing garbage collection and releasing memory.

To an OOP programmer, object Identity is therefore defined by memory location. What makes an object unique is that if I go to a particular place in memory, I interpret the bytes there in a particular way. Nothing else defines uniqueness and thus Identity.

In an OO language, because they are at heart Imperative, every variable is a scalar. This is not obvious. Variables might point to arrays, linked lists, trees, sets, maps, bags and so on. But ultimately the same rule applies: it’s an simply address and a tag saying what’s at the address.

Relational Programming is very different because Identity is not defined by memory addresses. It is defined by Uniqueness of a tuple or relation. This rule is weakened by SQL for performance reasons, but I will stick with it for now.

In Relational Programming, Identity is Inequality. Two things that have the same structure and values are the same thing. This is not necessarily true in OOP.

Out of the mathematical basis of relational databases falls a lot of useful properties. Relational databases can have various properties about them proved. Most usefully, a relational database can deduce efficient execution strategies from the declaration of the model. Relational programs are generally called Declarative — the database is asked to solve a problem, not given step-by-step instructions on how to do so.

The crux of the Set-Objective Approach

I feel that there’s room to create a language which marries OO programming and Relational programming. For lack of a better term I will call this Set-Objective Programming.

The key is to alter the concept of Identity in OOP.

In a Set-Objective language, there are no scalar values. No variable points to a single object in memory. Instead, all variables are in fact set expressions.

This turns the usual order inside out. We are used to traversing scalar variables to get to collections. In the set-objective scheme, everything is a collection; if we want to express a scalar value, we are in fact simply expressing a set with a single value.

Null values are logically the empty set; this may simplify the type system.

Inheritance is expressed through sets also. An object is a tuple of data and operations. It can be created with a query which defines a class of objects. A subclass is a query which further refines another query result. And so on.

Done properly this simplifies schema evolution. When we want to change the interface into existing data, we create new functions, define the objects via query, and present that to the client. The Set-objective runtime is responsible for working out the identities.

I think that turning Identity and assignment “inside out” — favouring collections as the common case and scalars as a special case — is actually closer to the real world. The world is filled with objects that can be grouped in arbitrary ways. It is filled with far fewer rigid hierarchies. It is filled still less with objects that cannot appear in more than one set (singletons). Since we want language to fit the problem domain, favouring sets favours the real world.

What it might look like.

This is the toughest part to write. The assignment operator in a Set-Objective language is comparable to a functional language pattern matcher, since these are query-like. The difference is that in deference to the mutability of OOP, the match is not final, but updates on use.

A simple example of employees.

People ]= Universal 		// a Person is a Subset of the Universal set
>>	[						// Defined as ...
		Name ]= Strings,
		Birthday ]= Dates
Employees ]= People
>>	[
		Joined ]= Dates { Joined > Tomorrow }, // can't add people in future
		Salary ]= Currency { 10,000 < Salary < 150,000 }, // a Currency range
		Manager ]= Managers
Managers ]= Employees
>>	[
		HasCornerOffice ]= Booleans

// insert data

Managers +=
	[ "John Smith",	Birthday=12-12-80, Joined=2-Jan-2001, $80,000, Does ]
+	[ "Jamie Jones", Birthday=4-3-78, Joined=4-3-98, $78,000, DoesNot ]
+	[ "B. Igg Shott", Birthday=7-Oct-62, Joined=14-April-92, $80,000, True ]

Employees +=
	[ "Jill Fontaine", 27-Mar-86, Joined=22-Aug-07, $27000, "Jamie Jones", False ]
+	[ "Jack Schmidt", 11-June-88, Joined=18-Oct-07, $24,500, [ "Jamie Jones", B. Igg Shott" ], False ]

// Queries

BirthdaysThisMonth ]= People { Birthday.Month = Today.Month }

AlphaEnoughForBusiness ]= Managers { HasCornerOffice }

NotAlphaEnoughForBusiness ]= Managers \ AlphaEnoughForBusiness

// Imperative bits ... lambdas I guess
	Print "{Name}'s birthday is on {Birthday}"
To { BirthdaysThisMonth ^ Managers }

Jamie Jones's birthday is on 4-3-78.

This entry was posted in Software Engineering, Technical Notes, Thought Bubbles. Bookmark the permalink.