Deep Dive into an ORM:
The Entity Framework Core Query Pipeline
Func<int, bool> comparer = num => num < 5;
Expression<Func<int, bool>> comparerExpression
= num => num < 5;
Func compiled = comparerExpression.Compile();
var numParam = Expression.Parameter(typeof(int), "num");
Expression<Func<int, bool>> comparerExpression2 =
Expression.Lambda<Func<int, bool>>(
Expression.LessThan(
numParam,
Expression.Constant(5)),
numParam);
var compiled2 = comparerExpression2.Compile();
Roslyn - high-level source representation
T4 - source representation with templates
Expression trees
Reflection.Emit - low-level IL
Let’s translate this:
employees.Where(c => c.FirstName == c.LastName)
SELECT ... FROM Employees WHERE FirstName = LastName
Could be wrong, SQL has three-valued logic
In SQL, FirstName = NULL is never true… (and it ain’t false either :))
Try to mimic C# behavior (best-effort only!)
WHERE FirstName = LastName OR
(FirstName IS NULL AND LastName IS NULL)
If one of the columns is non-nullable in the database schema, we can optimize
Counter-example: case-sensitivity
Let’s get complicated:
employees.Where(e => e.Tasks.Count == 2)
SELECT ...
FROM Employees AS e
WHERE (
SELECT COUNT(*)
FROM Tasks AS t
WHERE t.EmployeeId = e.Id
) = 2;
What about this?
employees.Where(e => e == e.Boss)
WHERE e.Id = Boss.Id
employees.Where(e => e.Tasks.First() == x)
WHERE (SELECT ... FROM Tasks ...).Id1 = x.Id1
Now, what happens if something has a composite key?
WHERE Id1 = Boss.Id1 AND Id2 = Boss.Id2
employees.Where(e => e.Tasks.First() == x)
WHERE (SELECT ...).Id1 = x.Id1 AND (SELECT ...).Id2 = x.Id2
Double evaluation… Better not do this!
In C#, “related instances” are just there in memory.
employees.Include(b => b.Tasks)
SELECT ...
FROM Employees AS e
LEFT JOIN Tasks AS t ON e.Id = t.EmployeeId
ORDER BY e.Id, t.Id
What to do about this?
employees.Where(e => SomeFunc(e))
Previous versions: translate to server if we can, otherwise evaluate on client
EF Core 3 throws, except for top-most projection
Exercise: optimize an expression tree, eliminating needless null checks on non-nullable database columns:
// INPUT:
customers.Where(c => c.Age > 18 && c.Name != null)
// OUTPUT:
customers.Where(c => c.Age > 18)
class MyVisitor1 : ExpressionVisitor
{
protected override Expression VisitBinary(BinaryExpression b)
{
var visited = (BinaryExpression)base.VisitBinary(b);
if (b.NodeType == ExpressionType.NotEqual)
{
if (b.Right is ConstantExpression rightConstant &&
rightConstant.Value == null &&
!IsNullable(b.Left))
{
return Expression.Constant(true);
}
// TODO: Sides may be flipped!
}
return visited;
}
}
customers.Where(c => c.Age > 18 && true)
class MyVisitor2 : ExpressionVisitor
{
protected override Expression VisitBinary(BinaryExpression b)
{
var visited = (BinaryExpression)base.VisitBinary(b);
if (b.NodeType == ExpressionType.AndAlso)
{
if (b.Right is ConstantExpression rightConstant &&
rightConstant.Value is bool boolValue &&
boolValue)
{
return b.Left;
}
// TODO: Same for other side!
}
// TODO: Also take care of OrElse!
return visited;
}
}
customers.Where(c => c.Age > 18)
From this:
customers.Where(c => c.Age > 18 && c.Name != null)
customers.Where(c => c.Age > 18 && true)
customers.Where(c => c.Age > 18)
public virtual Expression Process(Expression query)
{
query = new EnumerableToQueryableMethodConvertingExpressionVisitor().Visit(query);
query = new QueryMetadataExtractingExpressionVisitor(_queryCompilationContext).Visit(query);
query = new AllAnyToContainsRewritingExpressionVisitor().Visit(query);
query = new GroupJoinFlatteningExpressionVisitor().Visit(query);
query = new NullCheckRemovingExpressionVisitor().Visit(query);
query = new EntityEqualityRewritingExpressionVisitor(_queryCompilationContext).Rewrite(query);
query = new SubqueryMemberPushdownExpressionVisitor().Visit(query);
query = new NavigationExpandingExpressionVisitor(...).Expand(query);
query = new FunctionPreprocessingExpressionVisitor().Visit(query);
new EnumerableVerifyingExpressionVisitor().Visit(query);
return query;
}
customers
.Where(c => c.Name == "Joe")
.OrderBy(c => c.Age)
.Take(5)
.Select(c => new { c.Name, c.Age });
SELECT Name, Age FROM Customers
WHERE Age > 18
ORDER BY Age
LIMIT 10
Still expressions, but now custom SQL ones!
customers.Where(c => new[] { 1, 2, 3 }.Contains(c.Id));
// Becomes:
WHERE c.ID IN (1, 2, 3) // This is an SqlInExpression
Expression trees can have custom expression types
Non-relational providers will do something completely different
customers.Where(c => c.Name.Length > 5)
// Becomes:
WHERE LEN(c.Name) > 5 // On SqlServer
WHERE LENGTH(c.Name) > 5 // On PostgreSQL
Users can define functions which render into arbitrary SQL expressions
(…with… expression trees…)
var posts1 = ctx.Posts.Where(p => p.Title.Contains("dotnetos"));
var title = "dotnetos"; // Read from somewhere
var posts2 = ctx.Posts.Where(p => p.Title.Contains(title));
SELECT ... FROM Posts WHERE STRPOS('dotnetos') > 0;
SELECT ... FROM Posts WHERE STRPOS(@p) > 0;
We can just do recursive structural comparison:
var posts = ctx.Posts.Where(p => p.Title.Contains("dotnetos"));
var title = "dotnetos"; // Read from somewhere
var posts2 = ctx.Posts.Where(p => p.Title.Contains(title));
using (var ctx = new MyContext())
{
var id = 8;
var posts = ctx.Posts.Single(p => p.Id == id);
}
// Create an explicitly compiled query
private static Func<MyContext, int, Post> _postById
= EF.CompileQuery((MyContext ctx, int id) =>
ctx.Posts.Single(p => p.Id == id));
// Use the compiled query by invoking it
using (var ctx = new MyContext())
{
var post = _postById(ctx, 8);
}
Remember our InExpression from before?
customers.Where(c => new[] { 1, 2, 3 }.Contains(c.Id));
WHERE c.Id IN (1, 2, 3)
var customerIds = new[] { 1, 2, 3 };
customers.Where(c => customerIds.Contains(c.Id));
Expand to constant at execution time:
var values = new[] { 1, 2, 3 };
ctx.Posts.Where(p => values.Contains(p.Foo));
WHERE p.Foo IN (1, 2, 3)
WHERE p.Foo = ANY (@p)
Smit Patel
(new query pipeline architect)
Maurycy Markowski
(big query guy and also a Polish guy!)
And of course … the rest of the EF team
Shay Rojansky
Blog & Presentation: http://roji.org
Twitter: @shayrojansky