Developer Blog

Managing Data Classes With Ids

At Originate, we have worked on a number of medium- to large-scale Scala projects. One problem we continuously find ourselves tackling is how to represent the data in our system in a way that is compatible with the idea that sometimes that data comes from the client, and sometimes it comes from the database. Essentially, the problem boils down to: how do you store a model’s id alongside it’s data in a type-safe and meaningful way?

The Problem

Consider your user model

1
case class User(email: String, password: String)

When a new user signs up, they send a User to your system, which you then store in the database. At this point, a user now has an Id. Should the id be stored inside the User model? If so, it would have to be an Option[Id]. Perhaps, instead of storing optional ids in your user model, you prefer to just have two data classes. One case class to represent data received from the client, and one to represent data received from the database.

1
2
3
4
5
6
// option 1: duplicate data classes
case class User(id: Id, email: String, password: String)
case class UserData(email: String, password: String)

// option 2: optional ids
case class User(id: Option[Id], email: String, password: String)

Both options have their pros and cons. There is a 3rd option, which is the purpose of this blogpost, but let’s cover our bases first.

Duplicate Data Classes

One simple way to solve this problem is to consider that your system has two versions of your data: the version it receives from the client, and the version it receives from the database, an idea generalized by the CQRS design pattern.

Unfortunately, this adds a lot of overhead/boilerplate. Not only do you have to duplicate the amount of models you have in your system, you also have to make sure to reference the right version of that model in the right places. This can lead to a lot of confusion, not to mention the fact that with User and UserData, it’s not immediately clear what the difference is to someone new on the project.

The biggest issue here is that we lose the correlation between the two types of data. User does not know that it comes from UserData and vice versa. Even worse, if we have methods on that data, for example, something that gives us a “formatted name” for a user… either we need both User and UserData to inherit from the same trait, or we duplicate the method. Unfortunately, this pattern is clunky and annoying.

Optional Ids

We’ve tried this pattern on several large projects. On the one hand, it prevents us from having to duplicate all of our data classes, which is nice. The big problem with optional ids is that, most of the time, your data actually does have an id in it. Wrapping it with Option means that the rest of your system has to always consider that the value might not be there, even when it definitely should. This ends up producing a lot of code like this:

1
2
3
4
for {
  user <- UserService.first()
  userId <- user.id
} yield ...

It’s not the worst thing in the world, but when 80% of your code is dealing with users that have ids, it feels unnecessary. Also note that there is a hidden failure here. If your UserService.first() returns Some(user) but for whatever reason that user doesn’t have an Id, then it will look the same as if UserService.first() returned None. Programming around this is possible, but gets ugly if you use this pattern all over your codebase.

Additionally, lazy developers will say “I know that the id should be there, why can’t I just use user.id.get?”. Option#get is a slippery slope, if you make exceptions for it in your codebase, people will abuse it, and then you lose the safety of having Options in the first place. At that point you might as well not even use Option, because you are getting the Option version of an NPE and also dealing with the overhead of Option. If you have developers trying to sneak in .get, consider checking out Brian McKenna’s library: WartRemover.

Furthermore, the Optional Id pattern leads you to create a base class that all of your data classes inherit from.

1
2
3
trait BaseModel[T <: BaseModel[T]] { self: T =>
  val id: Option[Id]
}

This is so that you can create base service and data access traits that your layers inherit from. It’s worth noting that in this situation, you will likely want a method def withId(id: Id): T defined on BaseModel so that your base services/DAOs know how to promote a data class without an id (received from the client), to a data class that has an id (after being persisted). You’ll see in the next section that we can do away with all of this.

While this pattern works, and we have used it in production with success, the issue we run into is that the types and data don’t accurately reflect the concepts in our system. We want a way to say that ids are not optional when the data has been retrieved from the database, while also maintaining the relationship between data both in and outside of the database.

The Solution

After writing out the problem, and the other possible solutions, it starts to become clear that there is a better way. We know what we want:

  • There should be one class that represents a specific type of data, whether it’s from the client or the database.
  • We don’t want id’s to be optional. Data received from the database should represent that the id exists.
  • We don’t want values to ever be null.
  • Ideally we can minimize overhead (control flow, inheritance, typing overhead).

We introduce a class that contains an id and model data:

1
2
3
4
5
6
7
case class WithId[A](id: Id, model: A)

// receive data for a new user from the client
val user: User = Json.parse[User](json)

// receive data from the database
val user: WithId[User] = UserService.findByIdOrFail(userId)

This makes a lot of sense. We retain the fact that the only difference between client data and database data is that it has an id. We also avoid a good amount of overhead from the other two options (duplication, inheritance, and unnecessary complexity around ids). The main bit of overhead this introduces is extra typing when dealing with models in your service and DAO layer. While the types can get pretty nasty in some cases (our services are always asynchronous, so you may have Future[Seq[WithId[User]]]), it beats the alternatives.

Removing .model Calls

If the thought of having to do user.model.firstName feels ugly, there is a way around it using implicits:

1
2
3
object WithId {
  implicit def toModel[A](modelWithId: WithId[A]): A = modelWithId.model
}

Note that we have not tested this solution out on a large scale project, and it could add compilation time overhead.

Conclusion

Hopefully it is clear that while this is seemingly a small problem, finding the right way to model it in your system can have major implications on the cleanliness of your codebase. We have been trying the WithId pattern on a sizeable project for the last month with great results. No issues so far, and the type overhead isn’t that bad considering the additional safety it brings.

Comments