The understanding over Object Relational Mapping, or ORM, is fundamental to any good programmer. There are extremely few situations where its usage isn’t recommended, if any at all. If you need some incentive to learn it:
#1 Testing/debugging sucks ass.
It’s a fact. Maintenance involves costly activities. It would be nice if we could reduce these to a minimum level of resource and stress, allowing us to concentrate on more fun parts of our job, such as architecture! To achieve that, the computing community has developed a wide range of techniques. ORM is one of them and might help you to produce better code. Or not. But that’s up for each one of us to decide. In my opinion, it’s awesome.
What is our problem?
How one interacts with their data can create a huge complexity in their code. Large joins and projections can be awfully complicated. We want to reduce this complexity.
How do we solve it?
Let’s try to get rid of the SQL
queries and replace them by native code that will sound more natural, considering its context. This code will be translated into SQL
code eventually, giving us the exact same response with similar performance (it can be better, when caching is involved). In another words, let’s abstract the data layer.
A few cases to ponder
#1 User registration
A user has just registered, which means we should persist his account data. All the data is in an instance of the class User
:
class User(object):
def __init___(email=None, password=None):
self.email = email
self.password = password
Using SQL, I’d probably execute the SQL INSERT
command in order to add that entry in the database.
c = conn.cursor()
c.execute("INSERT INTO Users VALUES ("+ user.email + ", " + user.password + ")")
conn.commit()
conn.close()
Now imagine that User class has 30 properties. That c.execute()
line would be huge! A bunch of lines will hold exclusively a giant string that isn’t even in our “native language” (Python
, in this case). Wouldn’t it be nice if we could simply instantiate the class User
, fill email
and password
and call a method .save()
that would persist the data somehow? Something like:
user = User(email='ldavid@ufscar.br', password='godragons', ...)
user.save()
So much more intuitive!
#2 Ordering Schedules
Consider the class Schedules
that holds a reference to a specific time-window in the week. Additionally, Schedules
has a relationship many-to-many
with Class
(academic classes
, that is, where those Classes
can have many different Schedules
):
public class Schedule
{
public int Id { get; set; }
public TimeSpan TimeStarted { get; set; }
public TimeSpan TimeFinished { get; set; }
public DayOfWeek DayOfWeek { get; set; }
public ICollection<Class> Classes { get; set; }
}
public class Class
{
... // Class properties, such as professor, course, students etc
public ICollection<Schedule> Schedules { get; set; }
}
Now, I have a weekly calendar and I’d like to put all these schedules onto the calendar. However, it wouldn’t make sense to display a schedule of Tuesday night followed by a Monday morning. I want to order the entries in the database first. So, using good old SQL
, it would probably look something like:
var connectionString = "...";
var connection = new SqlConnection(connectionString);
var command = new SqlCommand("select * from Schedules order by DayOfWeek, TimeStarted", connection);
var response = DbConnection.execute(command);
// a foreach loop to extract schedules from response.
Okay, not so bad. At least it’s not so long. But there are some problems: we are leaving Object oriented programming and writing a command, strongly coupling the code with the manner in which the data is stored (it’s clearly a relational database). Another point is that we had to use three more classes in order to do a simple ordering in the database. What if we could simply run a chained statement, order and retrieve the data from the database, and already obtain the objects?
var schedules = Db.Schedules
.OrderBy(DayOfWeek)
.ThenBy(TimeStarted);
Pretty neat, right?
#3 Joining
Still looking at the Schedules
domain, what if we wanted to select all Classes
that start after 8am on a Monday?
eight = new TimeSpan(8, 0, 0);
var connectionString = "...";
var connection = new SqlConnection(connectionString);
var command = new SqlCommand(
"select Classes.* from Schedules, Classes, ClassesSchedules" +
"where" +
" Schedules.DayOfWeek = DayOfWeek.Monday and Schedules.TimeStarted >= " + eight.Ticks " +
" and ClassesSchedules.ScheduleId = Schedules.Id" +
" and ClassesSchedules.ClassesId = Classes.Id"
, connection);
var response = DbConnection.execute(command);
Well, it’s kind of obvious the point that I’m trying to make:
var classes = Db.Classes
.Where(c => c.Schedules.Any(s =>
s.DayOfWeek == DayOfWeek.Monday
&& s.TimeStarted >= new TimeSpan(8, 0, 0)));
#2 This is how you read it: give me all the
Classes
where those classes have at least onSchedule
that is on a Monday and has itsTimeStarted
greater or equal than 8am.
#3 Even if the above looks alien to you, you have to admit that is much easier than writing a
join
, right? This is called LINQ, and is one of the reasons why I love C#! :-)
#4 That was a toy example. Would you like to see something for real? Check out this statement. It’s validating if a
Student
can enroll in aClass
, considering all theSchedules
of thatClass
and of all theClasses
that he was already enrolled.
Existing ORMs
I must say it’s probably not a good idea to implement your own ORM
. In fact, reuse is regarded as a principle in OOP! There are many out there - with many good people working on it - that are highly optimized and will do just fine. Besides, if you don’t like something, there is your chance to write code that will be used by other people and will, most definitely, be appreciated! Anyhow, the list is extensive and will most likely contain an option of your interest. These are the ones I’ve had contact with:
- Entity Framework (C#)
- Django models (Python)
- Laravel’s Eloquent (PHP)
- Sqalchemy (Python)
- JPA (Java)
- Spring data JPA (Java)
Cool additional features
Lazy and eager loading
Yes, yes, that is where I took the name of the ending section from… This can help you a lot with performance. Or make it much worse. That’s why you should understand it: as in the Schedule
’s example, application’s entities are related. Inheritances, relationships: many times you need that information, but it isn’t in the table that you’re currently looking, which means you must perform a join
. Another times, you don’t need that information, so you want to avoid the join
.
Lazy
Lazy-loading’s policy is “only retrieve if needed”, which means it won’t join tables unless you explicitly try to access it.
A case where Lazy-loading is recommended:
Imagine we work for Bitbucket.org and we are implementing a page where one can add contributors to their current repository. We only need the usernames, not these users’ repositories, friends, permissions, roles [, [\w]+]*
. Therefore, it would be recommended to lazy load, as many unnecessary joins would be avoided.
Using Django models:
users = User.objects.all()
usernames = [user.username for user in users]
Using Entity Framework:
var usernames = Db.Users
.Select(u => u.Username);
A case where Lazy-loading is NOT recommended:
What if we need the users permissions
list as well? The fact is an user.permissions
property is not set, and will need a following database-lookup in order to be ready. Therefore, the following code would produce 1 + n
database-lookups, where n
is the number of registered users
. That’s disastrously bad.
users = User.objects.all()
usernames_and_permissions = [(user.username, user.permissions,) for user in users]
var users = Db.Users
.Select(u => new
{
Username = u.Username
Permissions = u.Permissions
});
A case where Eager-loading is recommended:
The above, of course. How to solve it? Just ask the driver to prefetch the related object.
users = User.objects.prefetch_related('permissions').all()
usernames_and_permissions = [(user.username, user.permissions,) for user in users]
var users = Db.Users
.Include(u => u.Permissions)
.Select(u => new
{
Username = u.Username,
Permissions = u.Permissions
});
#5
.prefetch_related
actually selects the whole data and joins it usingPython
, whereas.Include()
will be translated to SQL and ask the database management system to do the joining. Just a difference made by the frameworks…
Automatic migrations
You’ll definitely want to check this out. Automatic migrations
stands for the idea that we could automatically generate the database when we need it, in opposite of designing it from scratch, because there is a 1-to-1 mapping from your model classes to your database tables. Summarizing: write your classes and export it to your database.
Examples of automatic migrations
Django
Assume you have classes in models.py
that extend models.Model
, run:
python manage.py makemigrations
python manage.py migrate
Done!
.NET framework
Assume you have implemented your model classes and that they were added to your Context
class. Hit Ctrl+Q
, search for Package Manage Console
.
Update-Database
Done!
Laravel
Assume you have a class User that extends Eloquent
.
php artisan make:migration create_users_table --create=users
php artisan migrate
Done!
These are just some examples! You can find these features in any language you want!
Short version for lazy-loaders
// Main thought of this post:
// Don't be a dumbass programmer writing dumbass SQL strings!
// Instead, do it like this:
// Ordering...
var schedules = Db.Schedules
.OrderBy(DayOfWeek)
.ThenBy(TimeStarted);
// Joining...
var classes = Db.Classes
.Where(c => c.Schedules.Any(s =>
s.DayOfWeek == DayOfWeek.Monday
&& s.TimeStarted >= new TimeSpan(8, 0, 0)));
// Selecting...
var users = Db.Users
.Select(u => u.Username);