Entity Framework : Query types

Entity Framework Core (EF Core) streamlines database interactions, but its automatic translation of LINQ queries can sometimes lead to overly complex SQL. This becomes evident when you need to fetch data across multiple relationships, potentially resulting in performance problems. Split queries offer a way to optimize these situations.

For more information on EF Core click here

For more information on LINQ click here

Single Query

Single Query is the default type of query used with Entity Framework (EF), a Single query creates a query that returns a cartesian product which can, under the correct circumstances, create significant performance issues when loading and hydrating the entities in code.

Consider the following :

var posts = dbContext.Posts
                    .Include(p => p.Authors)
                    .Include(p => p.Comments)
                    .ThenInclude(c => c.Author)
                    .ToList();

This could produce SQL that looks as Follows :

SELECT [p].[Id] 
    ,[p].[Title]
    ,[p].[Description]
    ,[p].[Body]
    ,[p].[PostDate]
    ,[a1].[Id]
    ,[a1].[PostId]
    ,[a1].[Name]
    ,[a1].[User]
    ,[c].[Id]
    ,[c].[PostId]
    ,[c].[Title]
    ,[c].[Content]
    ,[c].[Date]
    ,[c].[AuthorId]
    ,[a2].[Id]
    ,[a2].[PostId]
    ,[a2].[Name]
    ,[a2].[User]
FROM [Posts] AS [p]
LEFT JOIN [Authors] AS [a1] on [p].[Id] = [a1].[PostId]
LEFT JOIN [Comments] AS [c] on [p].[Id] = [c].[PostId]
LEFT JOIN [Authors] AS [a2] on [c].[AuthorId] = [a2].[Id]

In this Example there is a Blog Post that can have multiple Authors and Comments, each comment can only have 1 Author. If a Blog has 1 Author and no comments, this can be returned in a Single Row, however if there were 3 Authors and 100 comments this becomes 300 rows that need to be returned. This occurs because the two joins are at the same level, please note even though Author is also a child of comment, given the relationship no additional rows are added.

Split Query

Split Query was introduced in EF Core 5 (November 2020), Instead of Creating a cartesian product, it will query the tables individually. This can lead to much more performant queries under the right circumstances.

If we consider :

var posts = dbContext.Posts
                    .Include(p => p.Authors)
                    .Include(p => p.Comments)
                    .AsSplitQuery()
                    .ToList();

This would generate something more akin to

SELECT [p].[Id] 
    ,[p].[Title]
    ,[p].[Description]
    ,[p].[Body]
    ,[p].[PostDate]
FROM [Posts] AS [p]

SELECT [a].[Id]
    ,[a].[PostId]
    ,[a].[Name]
    ,[a].[User]
From [Authors] as [a]
INNER JOIN [Posts] as [p] on [a].[PostId] = [p].[Id]

SELECT [c].[Id]
    ,[c].[PostId]
    ,[c].[Title]
    ,[c].[Content]
    ,[c].[Date]
    ,[c].[AuthorId]
FROM [Comments]
INNER JOIN [Posts] as [p] on [c].[PostId] = [p].[Id]

While this requires more round tips to SQL, if there were 3 Authors and 100 comments this becomes 104 rows (1 for Post, 3 for Author, and 100 for Comment) making it much less data for SQL to return and your application to construct as entities. I have had issues when using Skip/Take for Pagination as if the order is not done correctly then result can become inconsistent if you are not careful with the queries (I usually recommend not using Split Queries when paginating). For more information on Pagination in EF click here

Comparison

Characteristic Single Query Split Queries
Approach Fetches all Data in a single Result Set Divides the data into multiple smaller queries
Performance Can be inefficient for complex relationships or large datasets due to the cartesian product Often more performant with many-to-many relationships or selectively loading related data
Network overhead One single round-trip Multiple Database round-trips
Data Consistency Stronger as all data is fetched in a single transaction Potential for inconsistencies if data is modified between fetches, also more complex to ensure data is consistent when paginating

Choosing between single and split queries

  • Prioritize Single Queries when
    • You need the majority of related data.
    • Your queries involve relatively simple relationships.
    • You have tight data consistency requirements.
  • Consider split queries when
    • Your queries routinely return massive amounts of data.
    • You consistently only need specific portions of related data.
    • Network latency between application and database is a concern.

As I work in an environment where database latency is not usually an issue, I generally advise moving queries to use Split Query when a dataset has a multiple relations and the dataset becomes large, I have seen moving to split query take a result set from millions of rows down to thousands while SQL can handle returning a million rows without too much issue, the amount of time that it takes c sharp to translate these rows into objects for my application to use is non-trivial.

Conclusion

Entity Framework Core’s ability to perform both single and split queries provides you with the flexibility to tailor your data retrieval strategies. Understanding the trade-offs empowers you to make informed decisions for optimizing your application’s performance and user experience.

If you would like more information about Split Queries click here