Data Model for Ride/Cab Service Provider

Ankit Madhukar
5 min readFeb 6, 2022

Creating a Data model for ride and taxi service provider

Photo by Carlos Muza on Unsplash

Let's build out a data model for a company providing ride-booking options (uber/ola-like service). In this article, we will focus on the ride-booking business process for the company and we will follow Kimball’s Dimensional Designing steps to create it

Summarizing the Dimensional Modelling Technique

  1. Select the Business Process
  2. Declare the grain
  3. Identify the dimensions
  4. Identify the facts

Let's go through all these steps in detail

1. Select the Business Process

What am I tracking

In any company, there are a lot of business activities that generate relevant data points that can be used to analyze for data-driven decisions. But it becomes very important for the success of a Data warehousing project if the scope is correctly defined and reviewed with business stakeholders. This step is the foundation of the entire DWH project.

Although for a ride service provider company, you might jump to a conclusion that rides book is the main business process, so why not built the DWH around it. (which we are doing in this case) But there is other important business activities/process. For eg, in this case, we might create different models for

  1. Driver Payments
  2. Insurance Renewals
  3. Users adding money to wallet

And many more from the organization’s POV (HR, Operations, etc)

But coming back to our business process in the spotlight, Rides taken by users.

Now when defining the business process, it's important to take feedback from business users from different departments to establish and standardize it. While you do this exercise, also try to see what kind of questions they want this data model to answer (for eg, avg new users this month, rides taken % change WoW, and so on). This helps strengthen the details required for a successful DWH(Data Warehousing) project

2. Declare the Grain

Declaring the grain means specifying exactly what an individual fact table row represents

The above definition is from Kimball’s DataWarehousing Toolkit.

It helps decide the level of detail you want to go within the fact table.
Think like this: If you add user-related detail to rides, you can view and slice and dice the data set by users/customers.

Now if you want to also do the same with Drivers, you need to have the Driver field in the fact table. Now you have a more detailed record (User and Driver). As you keep increasing the attributes available in the Fact table, you are increasing the granularity of the table.

The more detailed/granular your fact table is, the more robust it is to ever-changing business queries related to this data model

In another scenario, let's say you are creating a view for one of the teams and they don’t require driver details (let's say they are from the marketing team and are focussing on customer/user KPIs). You can create an aggregated view on top of your fact table without the driver attribute (let's say driver_ID). This view is less granular compared to our fact table.

3. Identify the Dimensions

Who, what , where, when , why, and how

The dimensions help you map out the business event with these values which help answer the above question. Dimensions are also said to ‘define’ the business process.

For eg, Who is the user who ordered the ride

What can be what was the request (SUV, Sedan, Mini ), where and when are the place and time for the request/booking, and so on.

In a few cases, once you start defining the dimensions, you might have to revisit the grain of the fact table, if you stumble upon an important dimension, but the current grain does not allow for it to be included in the table.

for our data model, we will consider Users, Drivers, Location, Vehicle, and Promotion, along with Date Dimension which is almost a standard in any data model.

Users: contains details about customers who signed up for our service
Drivers: contains details about Drivers who signed up to provide the ride service
Location: details about the location (to and from) of the rides booked.
Promotion: details about any promotions running that lower the ride cost(like coupons you used)
Date: different calendar-related attributes, defined at day level for our use case ( so you don’t have to calculate these every time you run a query)

Question: What will the location table contain? Is it about all the locations you can get (maybe from a Geoservice) or a subset of locations where you had users book a ride at least once?

4. Identify Facts

| What is the process measuring?

In our case, we are measuring the rides-related KPIs(Key Performance Indicators) and metrics are fact is rides booked. Based on the grain we decided earlier we have to ensure our fact records follow the same, or if required review and revise the grain.

There are multiple fact table types like transaction, periodic snapshot, and accumulating snapshot fact tables. More on this in another article

Can you guess which among the three is at the lowest granularity?

Putting it all together, this is our rough data model(created in excel)

Initial Sketch of the Data Model-Also termed as Conceptual data model

I am using DBDesigner free version to create the data model.

Data Model- Also termed as Physical data model

To add some details, this is what the sample records in this table might look like

(sample records)
Sample Records

For this article, I added addition Fact and Dim to table names, but in DWH, the naming convention may be different, and it's very important to follow the same for consistency and avoid confusion.

--

--