Object Relational Mapping

An Overview on How ORMs Work in Multiple Languages — October 2, 2017

The understanding over Object Relational Mapping, or ORM, is fundamental to any good programmer. There are extremely few situations where its usage isn’t recommended, if any at all. If you need some incentive to learn it:

#1 Testing/debugging sucks ass.

It’s a fact. Maintenance involves costly activities. It would be nice if we could reduce these to a minimum level of resource and stress, allowing us to concentrate on more fun parts of our job, such as architecture! To achieve that, the computing community has developed a wide range of techniques. ORM is one of them and might help you to produce better code. Or not. But that’s up for each one of us to decide. In my opinion, it’s awesome.

An overview sketch of an ORM
An overview sketch of an ORM. Available at: blog.yellowant.com

What is our problem?

How one interacts with their data can create a huge complexity in their code. Large joins and projections can be awfully complicated. We want to reduce this complexity.

How do we solve it?

Let’s try to get rid of the SQL queries and replace them by native code that will sound more natural, considering its context. This code will be translated into SQL code eventually, giving us the exact same response with similar performance (it can be better, when caching is involved). In another words, let’s abstract the data layer.

A few cases to ponder

#1 User registration

A user has just registered, which means we should persist his account data. All the data is in an instance of the class User:

class User(object):
    def __init___(email=None, password=None):
        self.email = email
        self.password = password

Using SQL, I’d probably execute the SQL INSERT command in order to add that entry in the database.

c = conn.cursor()
c.execute("INSERT INTO Users VALUES ("+ user.email + ", " + user.password + ")")
conn.commit()
conn.close()

Now imagine that User class has 30 properties. That c.execute() line would be huge! A bunch of lines will hold exclusively a giant string that isn’t even in our “native language” (Python, in this case). Wouldn’t it be nice if we could simply instantiate the class User, fill email and password and call a method .save() that would persist the data somehow? Something like:

user = User(email='ldavid@ufscar.br', password='godragons', ...)
user.save()

So much more intuitive!

#2 Ordering Schedules

Consider the class Schedules that holds a reference to a specific time-window in the week. Additionally, Schedules has a relationship many-to-many with Class (academic classes, that is, where those Classes can have many different Schedules):

public class Schedule
{
    public int Id { get; set; }

    public TimeSpan TimeStarted { get; set; }
    public TimeSpan TimeFinished { get; set; }

    public DayOfWeek DayOfWeek { get; set; }

    public ICollection<Class> Classes { get; set; }
}

public class Class
{
    ... // Class properties, such as professor, course, students etc

    public ICollection<Schedule> Schedules { get; set; }
}

Now, I have a weekly calendar and I’d like to put all these schedules onto the calendar. However, it wouldn’t make sense to display a schedule of Tuesday night followed by a Monday morning. I want to order the entries in the database first. So, using good old SQL, it would probably look something like:

var connectionString = "...";
var connection = new SqlConnection(connectionString);
var command = new SqlCommand("select * from Schedules order by DayOfWeek, TimeStarted", connection);
var response = DbConnection.execute(command);

// a foreach loop to extract schedules from response.

Okay, not so bad. At least it’s not so long. But there are some problems: we are leaving Object oriented programming and writing a command, strongly coupling the code with the manner in which the data is stored (it’s clearly a relational database). Another point is that we had to use three more classes in order to do a simple ordering in the database. What if we could simply run a chained statement, order and retrieve the data from the database, and already obtain the objects?

var schedules = Db.Schedules
    .OrderBy(DayOfWeek)
    .ThenBy(TimeStarted);

Pretty neat, right?

#3 Joining

Still looking at the Schedules domain, what if we wanted to select all Classes that start after 8am on a Monday?

eight = new TimeSpan(8, 0, 0);
var connectionString = "...";
var connection = new SqlConnection(connectionString);
var command = new SqlCommand(
    "select Classes.* from Schedules, Classes, ClassesSchedules" +
    "where" +
    "    Schedules.DayOfWeek = DayOfWeek.Monday and Schedules.TimeStarted >= " + eight.Ticks " +
    "    and ClassesSchedules.ScheduleId = Schedules.Id" +
    "    and ClassesSchedules.ClassesId = Classes.Id"
    , connection);
var response = DbConnection.execute(command);

Well, it’s kind of obvious the point that I’m trying to make:

var classes = Db.Classes
    .Where(c => c.Schedules.Any(s =>
        s.DayOfWeek == DayOfWeek.Monday
        && s.TimeStarted >= new TimeSpan(8, 0, 0)));

#2 This is how you read it: give me all the Classes where those classes have at least on Schedule that is on a Monday and has its TimeStarted greater or equal than 8am.

#3 Even if the above looks alien to you, you have to admit that is much easier than writing a join, right? This is called LINQ, and is one of the reasons why I love C#! :-)

#4 That was a toy example. Would you like to see something for real? Check out this statement. It’s validating if a Student can enroll in a Class, considering all the Schedules of that Class and of all the Classes that he was already enrolled.

Existing ORMs

I must say it’s probably not a good idea to implement your own ORM. In fact, reuse is regarded as a principle in OOP! There are many out there - with many good people working on it - that are highly optimized and will do just fine. Besides, if you don’t like something, there is your chance to write code that will be used by other people and will, most definitely, be appreciated! Anyhow, the list is extensive and will most likely contain an option of your interest. These are the ones I’ve had contact with:

Cool additional features

Lazy and eager loading

Yes, yes, that is where I took the name of the ending section from… This can help you a lot with performance. Or make it much worse. That’s why you should understand it: as in the Schedule’s example, application’s entities are related. Inheritances, relationships: many times you need that information, but it isn’t in the table that you’re currently looking, which means you must perform a join. Another times, you don’t need that information, so you want to avoid the join.

Lazy

Lazy-loading’s policy is “only retrieve if needed”, which means it won’t join tables unless you explicitly try to access it.

Imagine we work for Bitbucket.org and we are implementing a page where one can add contributors to their current repository. We only need the usernames, not these users’ repositories, friends, permissions, roles [, [\w]+]*. Therefore, it would be recommended to lazy load, as many unnecessary joins would be avoided.

Using Django models:

users = User.objects.all()
usernames = [user.username for user in users]

Using Entity Framework:

var usernames =  Db.Users
    .Select(u => u.Username);

What if we need the users permissions list as well? The fact is an user.permissions property is not set, and will need a following database-lookup in order to be ready. Therefore, the following code would produce 1 + n database-lookups, where n is the number of registered users. That’s disastrously bad.

users = User.objects.all()
usernames_and_permissions = [(user.username, user.permissions,) for user in users]
var users = Db.Users
    .Select(u => new
    {
        Username    = u.Username
        Permissions = u.Permissions
    });

The above, of course. How to solve it? Just ask the driver to prefetch the related object.

users = User.objects.prefetch_related('permissions').all()
usernames_and_permissions = [(user.username, user.permissions,) for user in users]
var users = Db.Users
    .Include(u => u.Permissions)
    .Select(u => new
    {
        Username    = u.Username,
        Permissions = u.Permissions
    });

#5 .prefetch_related actually selects the whole data and joins it using Python, whereas .Include() will be translated to SQL and ask the database management system to do the joining. Just a difference made by the frameworks…

Automatic migrations

You’ll definitely want to check this out. Automatic migrations stands for the idea that we could automatically generate the database when we need it, in opposite of designing it from scratch, because there is a 1-to-1 mapping from your model classes to your database tables. Summarizing: write your classes and export it to your database.

Examples of automatic migrations

Django

Assume you have classes in models.py that extend models.Model, run:

python manage.py makemigrations
python manage.py migrate

Done!

.NET framework

Assume you have implemented your model classes and that they were added to your Context class. Hit Ctrl+Q, search for Package Manage Console.

Update-Database

Done!

Laravel

Assume you have a class User that extends Eloquent.

php artisan make:migration create_users_table --create=users
php artisan migrate

Done!

These are just some examples! You can find these features in any language you want!

Short version for lazy-loaders

// Main thought of this post:
// Don't be a dumbass programmer writing dumbass SQL strings!
// Instead, do it like this:

// Ordering...
var schedules = Db.Schedules
    .OrderBy(DayOfWeek)
    .ThenBy(TimeStarted);

// Joining...
var classes = Db.Classes
    .Where(c => c.Schedules.Any(s =>
        s.DayOfWeek == DayOfWeek.Monday
        && s.TimeStarted >= new TimeSpan(8, 0, 0)));

// Selecting...
var users = Db.Users
    .Select(u => u.Username);