Entity Framework Core (EF Core) streamlines database interactions, but its automatic translation of LINQ queries can sometimes lead to overly complex SQL. This becomes evident when you need to fetch data across multiple relationships, potentially resulting in performance problems. Split queries offer a way to optimize these situations.
For more information on EF Core click here
For more information on LINQ click here
Single Query
Single Query is the default type of query used with Entity Framework (EF), a Single query creates a query that returns a cartesian product which can, under the correct circumstances, create significant performance issues when loading and hydrating the entities in code.
Consider the following :
var posts = dbContext.Posts
.Include(p => p.Authors)
.Include(p => p.Comments)
.ThenInclude(c => c.Author)
.ToList();
This could produce SQL that looks as Follows :
SELECT [p].[Id]
,[p].[Title]
,[p].[Description]
,[p].[Body]
,[p].[PostDate]
,[a1].[Id]
,[a1].[PostId]
,[a1].[Name]
,[a1].[User]
,[c].[Id]
,[c].[PostId]
,[c].[Title]
,[c].[Content]
,[c].[Date]
,[c].[AuthorId]
,[a2].[Id]
,[a2].[PostId]
,[a2].[Name]
,[a2].[User]
FROM [Posts] AS [p]
LEFT JOIN [Authors] AS [a1] on [p].[Id] = [a1].[PostId]
LEFT JOIN [Comments] AS [c] on [p].[Id] = [c].[PostId]
LEFT JOIN [Authors] AS [a2] on [c].[AuthorId] = [a2].[Id]
In this Example there is a Blog Post that can have multiple Authors and Comments, each comment can only have 1 Author. If a Blog has 1 Author and no comments, this can be returned in a Single Row, however if there were 3 Authors and 100 comments this becomes 300 rows that need to be returned. This occurs because the two joins are at the same level, please note even though Author is also a child of comment, given the relationship no additional rows are added.
Split Query
Split Query was introduced in EF Core 5 (November 2020), Instead of Creating a cartesian product, it will query the tables individually. This can lead to much more performant queries under the right circumstances.
If we consider :
var posts = dbContext.Posts
.Include(p => p.Authors)
.Include(p => p.Comments)
.AsSplitQuery()
.ToList();
This would generate something more akin to
SELECT [p].[Id]
,[p].[Title]
,[p].[Description]
,[p].[Body]
,[p].[PostDate]
FROM [Posts] AS [p]
SELECT [a].[Id]
,[a].[PostId]
,[a].[Name]
,[a].[User]
From [Authors] as [a]
INNER JOIN [Posts] as [p] on [a].[PostId] = [p].[Id]
SELECT [c].[Id]
,[c].[PostId]
,[c].[Title]
,[c].[Content]
,[c].[Date]
,[c].[AuthorId]
FROM [Comments]
INNER JOIN [Posts] as [p] on [c].[PostId] = [p].[Id]
While this requires more round tips to SQL, if there were 3 Authors and 100 comments this becomes 104 rows (1 for Post, 3 for Author, and 100 for Comment) making it much less data for SQL to return and your application to construct as entities. I have had issues when using Skip/Take
for Pagination as if the order is not done correctly then result can become inconsistent if you are not careful with the queries (I usually recommend not using Split Queries when paginating). For more information on Pagination in EF click here
Comparison
Characteristic | Single Query | Split Queries |
---|---|---|
Approach | Fetches all Data in a single Result Set | Divides the data into multiple smaller queries |
Performance | Can be inefficient for complex relationships or large datasets due to the cartesian product | Often more performant with many-to-many relationships or selectively loading related data |
Network overhead | One single round-trip | Multiple Database round-trips |
Data Consistency | Stronger as all data is fetched in a single transaction | Potential for inconsistencies if data is modified between fetches, also more complex to ensure data is consistent when paginating |
Choosing between single and split queries
- Prioritize Single Queries when
- You need the majority of related data.
- Your queries involve relatively simple relationships.
- You have tight data consistency requirements.
- Consider split queries when
- Your queries routinely return massive amounts of data.
- You consistently only need specific portions of related data.
- Network latency between application and database is a concern.
As I work in an environment where database latency is not usually an issue, I generally advise moving queries to use Split Query when a dataset has a multiple relations and the dataset becomes large, I have seen moving to split query take a result set from millions of rows down to thousands while SQL can handle returning a million rows without too much issue, the amount of time that it takes c sharp to translate these rows into objects for my application to use is non-trivial.
Conclusion
Entity Framework Core’s ability to perform both single and split queries provides you with the flexibility to tailor your data retrieval strategies. Understanding the trade-offs empowers you to make informed decisions for optimizing your application’s performance and user experience.
If you would like more information about Split Queries click here