Starting Point
In our application, TimeRocket, we frequently run into a scenario where after an initial pre-filtering phase we issue another query for the elements identified beforehand. A concrete example: we want to fetch calendar events, but first need to determine which calendars to query.
It typically looks something like this:
public async Task<Row> GetEreignisse(
EmployeeGuid[] employeeIds,
Workday workday)
{
var calendarIds = await GetCalendarIdsForEmployees(calendars);
return await this.Query<Row>(
@"SELECT * FROM calendarEntries e
WHERE e.tenantId = @tenantId
AND e.workday = @workday
AND e.calendarId IN @calendarIds",
new
{
calendarIds = calendarIds,
tenantId = tenantId,
workday = ToDateTime(workday)
});
}
We use Dapper in our software, which is why we write e.calendarId IN @calendarIds instead of something like e.calendarId IN (@calendarIds). But that’s merely an implementation detail.
So the question is: what’s actually wrong with this approach?
Well, let’s assume calendarIds contains three IDs. On SQL Server the statement ultimately becomes something like:
SELECT *
FROM calendarEntries e
WHERE e.tenantId = @tenantId
AND e.workday = @workday
AND e.calendarId IN (@calendarId1, @calendarId2, @calendarId3)
This leads to two issues:
1. Parameter Limit
Because SQL Server requires one parameter per element, you hit the limit relatively quickly. On MS SQL, the maximum number of parameters is 2100. This may sound absurdly high, but we have absolutely encountered places where we ran into this limit.
Our “solution” back then involved hideous ideas such as splitting the items into chunks (say, batches of 1000), executing a query per chunk, and then stitching the results back together. Yeah, I’m actually guilty of implementing this…
2. Constantly Changing Query Plans
Even worse: every time the number of elements changes, SQL Server generates a completely new query plan. Technically speaking, a query with two elements ends with:
e.calendarId IN (@calendarId1, @calendarId2)
while another ends with
e.calendarId IN (@calendarId1, @calendarId2, @calendarId3)
These are not the same queries, which means SQL Server treats them as different plans.
This uses lots of disk space and causes SQL Server to frequently re-calculate query plans, which is a pretty expensive operation.
Solution
Our solution to this problem is called Table-Valued Parameters, or TVPs.
As far as I know, this feature exists only on MS SQL (since SQL Server 2008). Other SQL servers may offer something similar, but I’m not familiar with those.
What Are TVPs and How Do They Work?
MS SQL allows us to define our own data types, called “user-defined types”, in addition to built-in types like NVARCHAR or INT. TVPs extend this concept by allowing us to define entire tables as a type.
A TVP consists of rows (our elements) and columns (in our case, just one GUID column). When executing a query, we populate a table on the backend side (e.g., an ADO.NET DataTable) and pass the entire table as a single parameter.
Inside the SQL statement, we now have access to this table parameter and can simply JOIN it.
Let’s take a look at the code:
public async Task<Row> GetEreignisse(
EmployeeGuid[] employeeIds,
Workday workday)
{
var calendarIds = await GetCalendarIdsForEmployees(calendars);
var dataTable = new System.Data.DataTable();
dataTable.Columns.Add("Id", typeof(Guid));
foreach (var guid in calendarIds)
{
dataTable.Rows.Add(guid);
}
var tvp = dataTable.AsTableValuedParameter("GuidList");
return await this.Query<Row>(
@"SELECT * FROM calendarEntries e
JOIN @calendarIds c ON e.calendarId = c.Id
WHERE e.tenantId = @tenantId
AND e.workday = @workday",
new
{
calendarIds = tvp,
tenantId = tenantId,
workday = ToDateTime(workday)
});
}
In the example above, we used two “magic” strings: "Id" and "GuidList". Where do these come from?
Before we can use a TVP at all, the type must be created once on SQL Server:
CREATE TYPE GuidList AS TABLE (Id UNIQUEIDENTIFIER NOT NULL);
Here, we define a user-defined type GuidList. It’s a table type (AS TABLE) with a single GUID column named Id.
Next, we need to grant access:
GRANT EXECUTE ON TYPE::dbo.GuidList TO public;
Of course, you could restrict this to specific users or groups, but since this is merely a type and not actual data, I personally don’t see why it shouldn’t just be available publicly.
After these two steps, our GuidList type can now be used anywhere on the SQL Server. And yes you understood correctly: this type needs to be created only once. Not per query. And it can be reused across entirely different queries on entirely different tables.
Naturally, converting our GUIDs into a TVP can be encapsulated in a helper function:
public static SqlMapper.ICustomQueryParameter ToGuidListTvp(IEnumerable<Guid> guids)
{
var dt = new System.Data.DataTable();
dt.Columns.Add("Id", typeof(Guid));
foreach (var guid in guids.Distinct())
{
dt.Rows.Add(guid);
}
return dt.AsTableValuedParameter("GuidList");
}
Personally, I really like this solution because it doesn’t distract from the actual query and -once set up- can be reused effortlessly.
What do you think? Let me know your thoughts in the comments!
[…] Why you shouldn’t use INs in SQL and what to use instead (Domenic Helfenstein) […]
Interesting solution.
I also like the approach MS chose in EF Core, where they use OPENJSON. This way you don’t need to add any custom types.
https://devblogs.microsoft.com/dotnet/announcing-ef8-preview-4/
Interesting, how the solutions compare performance wise.