At Originate, we have worked on a number of medium- to large-scale Scala projects. One problem we continuously find ourselves tackling is how to represent the data in our system in a way that is compatible with the idea that sometimes that data comes from the client, and sometimes it comes from the database. Essentially, the problem boils down to: how do you store a model’s id alongside it’s data in a type-safe and meaningful way?
Consider your user model
When a new user signs up, they send a
User to your system, which you then store in the database. At this point, a user now has an
Should the id be stored inside the
User model? If so, it would have to be an
Option[Id]. Perhaps, instead
of storing optional ids in your user model, you prefer to just have two data classes. One case class to represent data received from the client, and
one to represent data received from the database.
1 2 3 4 5 6
Both options have their pros and cons. There is a 3rd option, which is the purpose of this blogpost, but let’s cover our bases first.
Duplicate Data Classes
One simple way to solve this problem is to consider that your system has two versions of your data: the version it receives from the client, and the version it receives from the database, an idea generalized by the CQRS design pattern.
Unfortunately, this adds a lot of overhead/boilerplate. Not only
do you have to duplicate the amount of models you have in your system, you also have to make sure to reference the right version of that
model in the right places. This can lead to a lot of confusion, not to mention the fact that with
UserData, it’s not immediately
clear what the difference is to someone new on the project.
The biggest issue here is that we lose the correlation between the two types of data.
User does not know that it comes from
vice versa. Even worse, if we have methods on that data, for example, something that gives us a “formatted name” for a user… either we need
UserData to inherit from the same trait, or we duplicate the method. Unfortunately, this pattern is clunky and annoying.
We’ve tried this pattern on several large projects. On the one hand, it prevents us from having to duplicate all of our data classes, which
is nice. The big problem with optional ids is that, most of the time, your data actually does have an id in it. Wrapping it with
means that the rest of your system has to always consider that the value might not be there, even when it definitely should.
This ends up producing a lot of code like this:
1 2 3 4
It’s not the worst thing in the world, but when 80% of your code is dealing with users that have ids, it feels unnecessary. Also
note that there is a hidden failure here. If your
Some(user) but for whatever reason that user
doesn’t have an
Id, then it will look the same as if
None. Programming around this is possible,
but gets ugly if you use this pattern all over your codebase.
Additionally, lazy developers will say “I know that the id should be there, why can’t I just use
Option#get is a slippery slope,
if you make exceptions for it in your codebase, people will abuse it, and then you lose the safety of having
Options in the first place.
At that point you might as well not even use
Option, because you are getting the
Option version of an NPE and also dealing with the overhead
Option. If you have developers trying to sneak in
.get, consider checking out
Brian McKenna’s library: WartRemover.
Furthermore, the Optional Id pattern leads you to create a base class that all of your data classes inherit from.
1 2 3
This is so that you can create base service and data access traits that your layers inherit from. It’s worth noting that in this
situation, you will likely want a method
def withId(id: Id): T defined on
BaseModel so that your base services/DAOs know how
to promote a data class without an id (received from the client), to a data class that has an id (after being persisted). You’ll
see in the next section that we can do away with all of this.
While this pattern works, and we have used it in production with success, the issue we run into is that the types and data don’t accurately reflect the concepts in our system. We want a way to say that ids are not optional when the data has been retrieved from the database, while also maintaining the relationship between data both in and outside of the database.
After writing out the problem, and the other possible solutions, it starts to become clear that there is a better way. We know what we want:
- There should be one class that represents a specific type of data, whether it’s from the client or the database.
- We don’t want id’s to be optional. Data received from the database should represent that the id exists.
- We don’t want values to ever be
- Ideally we can minimize overhead (control flow, inheritance, typing overhead).
We introduce a class that contains an id and model data:
1 2 3 4 5 6 7
This makes a lot of sense. We retain the fact that the only difference between client data and database data is that it has an id. We also avoid
a good amount of overhead from the other two options (duplication, inheritance, and unnecessary complexity around ids). The main bit of overhead this
introduces is extra typing when dealing with models in your service and DAO layer. While the types can get pretty nasty in some cases (our services are
always asynchronous, so you may have
Future[Seq[WithId[User]]]), it beats the alternatives.
If the thought of having to do
user.model.firstName feels ugly, there is a way around it using implicits:
1 2 3
Note that we have not tested this solution out on a large scale project, and it could add compilation time overhead.
Hopefully it is clear that while this is seemingly a small problem, finding the right way to model it in your system can have major implications
on the cleanliness of your codebase. We have been trying the
WithId pattern on a sizeable project for the last month with great results. No issues
so far, and the type overhead isn’t that bad considering the additional safety it brings.