EF, ORM, and DALs

by | May 12, 2020 | Software, Software Development | 0 comments

Several years ago, I started a rewrite of my trading/research platform using Entity Framework and a relational back-end. It was my first experience using an ORM, so I read a few blog posts on whether I needed a separate data access layer (DAL) on top of the ORM. One school of thought says, no, the ORM is your DAL and whatever you do in an additional DAL is just duplication of effort. Always one to avoid unnecessary work, I went merrily on my way and used EF as my DAL.

Everything worked great. Well, maybe not great, but good enough. Most of the time I ignored lazy loading and proxy issues, but since the database was always local, performance was good enough, and when it wasn’t, I addressed the issues and moved on. My application code was peppered with data-model-specific functionality, but again, it was good enough for what I needed.

Fast forward five years.

I decided to move my database to AWS Aurora Serverless (more on that in another post). AWS Aurora offers a PostgreSQL-compatible version, which was awesome since my backend is PostgreSQL. The serverless features should save me a bunch of money, since I only use the database for about an hour each day. So, thinks I, piece of cake, I can move the database to the cloud, use a thick client, and then over time move more of the functionality into the cloud and out of the client. Yes, it will be slow because there will be Internet-level latency, but I can fixup the parts that are really, really slow. I’m the only user, so I’ll make it “good enough.”

Not so fast, buckaroo. From the AWS documentation: “You can’t give an Aurora Serverless DB cluster a public IP address. You can access an Aurora Serverless DB cluster only from within a virtual private cloud (VPC) based on the Amazon VPC service.” 

What does that mean? In a nutshell, it means that the thick client running on my desktop can’t connect to the database. I need some middleware to sit between the client and the database. Ok, no problem, I’ll use the AWS API Gateway and some AWS Lambda serverless functions to access the database.

But wait. My DAL is Entity Framework. There are hundreds of instances of DbContext and DbSet in my application. I use DbContext as my unit of work. How am I going to port my application from using a DbContext that connects to a SQL database server to using HTTP API requests?

For about five seconds, I considered writing new versions of DbContext and DbSet that would implement the repository pattern and the HTTP requests and be the needed DAL. But remember, I’m using DbContext as my unit-of-work. All instances of DbContext in my application are short-lived. I usually create a DbContext, get some data from the database, and then destroy the DbContext. So to prevent an immense amount of traffic on the wire I would have to implement extensive caching – so much that I might as well just replicate the database locally, and that kind of defeats my whole objective, which was to remote the database.

I concluded that I need to move the application code away from EF and implement a custom DAL. The new DAL will comprise a set of repository classes that will handle all the HTTP requests and caching. The repository classes will serve up domain objects, not database objects.

Which is how I should have done it to begin with.

In Clean Architecture, Bob Martin states that the selection of a database system is a detail that should be put off until as late as possible in the development cycle. In my case, I selected an ORM before writing the application code. It strikes me that writing the application code before the selection of a database technology forces you to create an architecture that will be more flexible in the future.