Home
  Latest posts
  My Writings
  My Code
  My Gallery
  About me
 
  rssfeed Syndication
 
Bloggtoppen.se
 
 
Links
  Cornerstone
  SweNug
 
Post categories
  misc (48)
  Architecture (21)
  C# (19)
  Asp.Net (2)
  Vb.Net (2)
  Training (7)
  Data (19)
  Events (40)
  Platform (2)
  Orcas (4)
  Updates (3)
  Methods (10)
  Tools (6)
  Announcements (14)
  Languages (1)
  Patterns (6)
  Opinions (11)
  Fun (3)
  Ineta (1)
  Opinion (0)
  Practices (2)
  WCF (5)
 
 
 
Tools (6)
LINQ To SQL: Support for Eager Loading, Really? Thursday, February 28, 2008

Eager loading

One feature in ORM that's really useful and helps optimization is eager loading. The idea is quite simple, you define what objects in a graph you want to load for a single query and the ORM will make sure that the full graph is loaded after the query. Really efficient for graphs with complex hierarchies and the last year or so I've found myself always defining load spans for eager loading for every query I use.

I really like the basic idea of how Load Spans are defined in LINQ to SQL (but what's up with the per-context thing?), it's quite simple:

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(user => user.ToDoLists);
options.LoadWith(list => list.ToDoItems);
context.LoadOptions = options;

This will instruct the context to always load the association ToDoLists when loading a User and always load the association ToDoItems when loading a ToDoList. Pretty neat. "But what if the associations have millions of rows in the database?" - you might ask. Well they thought about that to and you can add a call to "AssociateWith" to constraint load spans. Here's an example:

options.AssociateWith(list => list.ToDoItems.Where(item => item.IsDone));

Which will constrain the load span to just eager load the items where IsDone is set to true. Again a very neat solution to a complex problem. An observant reader would expect a "but" right about now; and here it is:

The But!

When digging into this feature, the first thing one will notice is that the eager load works fine for one level down. That is; if the load span just asks for User.ToDoLists everything will work as expected. Our scenario will be different though, with a load span that tries to load multiple levels we will get a different result.

Consider this piece of code:

 DataLoadOptions options = new DataLoadOptions();

 options.LoadWith(user => user.ToDoLists);
 options.LoadWith(list => list.ToDoItems);
 IEnumerable usersQuery;

 using (ToDoDataContext context = new ToDoDataContext())
 {
        context.LoadOptions = options;

      usersQuery = from user in context.Users
                 select user;

      foreach (var item in usersQuery)
      {
        Console.WriteLine("{0} ({1})", item.FirstName, item.ToDoLists.Count);

            foreach (var list in item.ToDoLists)
                Console.WriteLine("{0} ({1})", list.Name, list.ToDoItems.Count);
      }
}

Looking at the code here one would assume that when we start to iterate through the query, the complete object graph will have been loaded. In Linq To Sql this isn't true.  Looking at the SQL server profiler trace for this query the truth quickly reveals itself. This is the first query issued to the database:

 -->SELECT --> [t0].[UserId], [t0].[FirstName], [t0].[LastName], [t0].[UserName], [t0].[Enabled], [t0].[Version]
 -->FROM --> [dbo].[Users] AS [t0]

Not much eager about that query one would say. The first thing that happens is that all the users gets fetched to the application. Well no harm in that, it's a perfectly viable solution for eager loading, event though not the most efficient one. The problems comes with the next query issued:

exec sp_executesql N'SELECT [t0].[ToDoListId], [t0].[UserId], [t0].[Name], [t0].[Description], 
[t0].[Version], [t1].[ToDoItemId], [t1].[ToDoListId] AS [ToDoListId2], [t1].[Name] AS [Name2],
[t1].[Completed], [t1].[Version] AS [Version2], (
-->SELECT --> COUNT(*) -->FROM --> [dbo].[ToDoItems] AS [t2] -->WHERE --> [t2].[ToDoListId] = [t0].[ToDoListId] ) AS [value] -->FROM --> [dbo].[ToDoLists] AS [t0] LEFT OUTER -->JOIN --> [dbo].[ToDoItems] AS [t1] -->ON --> [t1].[ToDoListId] = [t0].[ToDoListId] -->WHERE --> [t0].[UserId] = @x1 ORDER BY [t0].[ToDoListId], [t1].[ToDoItemId]',N'@x1 int',@x1=1

After carefully examining this query, you'll see that while it actually eager loads lists together with items, this query will only fetch lists and items for a single user and looking at the rest of the trace it becomes quite evident that every user will issue a command to get it's list and items. So for every user object an additional query will be generated. It is just plain wrong to call that eager loading, eager loading shouldn't have it's number of queries dependent on the number of objects at the root hierarchy!

Another issue that I've found is that the "eager loading" isn't done at the query execution level, but at the materialization level of the root object. That means that when a user object get's materialized, a second query is sent to the database to fetch all the lists and items. Up until now one could still argue that it's eager loading, but this approach just makes it "automatic lazy loading". For this scenario the ToDoList can't possible be argued to be eager loaded, the only difference between this approach and  "lazy loading" is that the lazy load get's triggered by the materialization and not by the access to the list itself. To make matters even worse, the database connection is opened and closed for every materialization of a user object, so if you don't have a couple of connections in your pool or no pooling at all, this will be very expensive indeed.

Now in the example above I use the deferred loading feature in LINQ. What if I materialized the query before the iteration? As it turns out, calling ToList before the iteration will not change the behavior against the database. It will still be the same queries issued to the database in the same fashion as before, only now the materialization is hidden in the call to ToList.

This is not what I would call eager loading or even load spans in the common use of the term.

How SHOULD it be?

If I instruct my framework to eager load with a load span, I expect the framework to load my object graph with as few queries as possible. At most that would mean one query per level in the hierarchy. Something like:

 -->select --> *  -->from --> users  -->where --> name like @p0 + '%'
 -->select --> *  -->from --> todolists
inner  -->join --> ( -->select --> *  -->from --> users  -->where --> name like @p0 + '%') users  -->on --> users.userId = todolists.userId
 -->select --> *  -->from --> todoitems
inner  -->join --> 
( -->select --> *  -->from --> todolists
inner  -->join --> ( -->select --> *  -->from --> users  -->where --> name like @p0 + '%') users  -->on --> users.userId = todolists.userId)
lists  -->on --> items.listid = lists.listid

Summary

So Eager Load in Linq To SQL is only eager loading for one level at a time. As it is for lazy loading, with Load Options we will still issue one query per row (or object) at the root level and this is something we really want to avoid to spare the database. Which is kind of the point with eager loading, to spare the database. The way LINQ to SQL issues queries for the hierarchy will decrease the performance by log(n) where n is the number of root objects. Calling ToList won't change the behavior but it will control when in time all the queries will be issued to the database.

kick it on DotNetKicks.com
Leave a comment Comments (3)
 
Sparcle CTP released Tuesday, January 24, 2006
At last! XAML will now be so much easier to handle with Sparcle released!

For you who don't know, Sparcle is the tool for XAML animation and interaction design.

I don't know about you, but I'll head over at once to:
http://www.microsoft.com/downloads/details.aspx?FamilyID=ed9f5fb2-4cfc-4d2c-9af8-580d644e3d1d&DisplayLang=en&hash=TJF7SW9
and start playing around!!!


Leave a comment Comments (0)
 
LINQ Technical Preview Update Monday, November 21, 2005
Microsoft has released a refresh of the LINQ technical preview that works with Visual Studio 2005 RTM, so now I can ditch my beta 2 VPC :)

Download here: http://download.microsoft.com/download/4/7/0/4703eba2-78c4-4b09-8912-69f6c38d3a56/LINQ%20Preview.msi

Leave a comment Comments (0)
 
WinFX bits Updated! Sunday, November 20, 2005
Tim Sneath has put up a post with WinFX Updates to accomadte the RTM version of 2005. At last :)

I'll just throw my current projects GUI out the door and just rebuild it in WPF ;)

Here's the link: http://blogs.msdn.com/tims/archive/2005/11/18/494526.aspx

Leave a comment Comments (0)
 
New drop of WWF released Monday, November 07, 2005
Works with VS2005 RTM - http://blogs.msdn.com/pandrew/archive/2005/11/04/489353.aspx
Leave a comment Comments (0)
 
Mats back in the loop Thursday, November 03, 2005
The O/R ambassador Mats Helander has been quiet for a very long time in the BlogSphere (have had some "run ins" on msn though) but yesterday he made sure to get into the sphere again in a bang,

With a blog entry the length of a bible, Mats headed out to end the endless entity vs datarow debate and to lecture some of us wannabe o/r users about the two ways to use o/r mappers. Although he tried to be diplomatic about it, I thought I could sense that in Mats mind one of the approaches where superior.

If you got some time to spare or are just really interested, read the full blog entry here: http://www.matshelander.com/Weblog/DisplayLogEntry.aspx?LogEntryID=80

Now to the subject of my own blog post.

In Mats argumentations he spoke a lot about the similarities with Windows Virtual Memory where he compared the two as "you never know where your objects are, in memory or swapped out".

Now that's a bit limping.

In any O/R mapper I've tried as of today, it usually takes an action from the domain client to actually persist the entity objects and then invalidate the references to the objects (if inside an aggregate that is) to be able to seamlessly move objects to and from storages.

One could argue that this is only the case when we want persistence, but thinking about it; its not really about persistence. If you want the illusion to be 100% you need to make sure that your objects flow both up and down from memory without any interaction.

I could see a couple of ways implementing this.

  1. Make sure your Identity Map invalidates entities on a "less frequent usage"-basis
  2. Incorporate weak references
  3. Add a intermediate storage facility for objects that shouldn't really be finalized in their persistence
  4. And so on?.

Now to the big question, do we really want this? Do we really want our entities to work this seamlessly?

Looking at the steps to get there, we're probably looking at a list similar to the one first created when Windows VM was planned. I'm not so sure we want to do this and that's why I'm not so sure about the comparison.

Is Mats trying to get this behavior in Npersist or is he aware that the comparison is limping and hoped we wouldn't notice ;)

Another thing, claiming that entities vs datarow is just about style and preferences; Well in one way I agree. It is about what style and preferences you're going to use in your application development:

  1. For smaller applications, do you want the overhead that data set / data table / data row gives you for the ease of use?
  2. For medium size applications, do you want to put yourself in a position where you get to no 3 and don't like the situation?
  3. For large systems, good luck building Identity Map, L2 cache and Lazy loading support into your typed data sets. That will sure steal away a couple of weekends for you.
I really don't want to be a synic, but sometimes I just can't help it.
Leave a comment Comments (3)