Solving the Detached Many-to-Many Problem with the Entity Framework / 憋错料

Introduction

This article is part of the ongoing series I’ve been writing recently, but
can be read as a standalone article. I’m going to do a better job of
integrating the changes documented here into the ongoing solution I’ve been
building.

However, considering how much time and effort I put into solving this issue,
I’ve decided to document the approach independently in case it is of use to
others in the interim.

The Problem Defined

This issue presents itself when you are dealing with disconnected/detached
Entity Framework POCO objects,. as the DbContext doesn’t track changes to
entities. Specifically, trouble occurs with entities participating in a
many-to-many relationship, where the EF has hidden a “join table” from the model
itself.

The problem with detached entities is that the data context has no way of
knowing what changes have been made to an object graph, without fetching the
data from the data store and doing an entity-by-entity comparison – and that
assuming it’s possible to fetch the same way as it was originally.

In this solution, all the entities are detached, don’t use proxy types and
are designed to move between WCF service boundaries.

Some Inspiration

There are no out-of-the-box solutions that I’m aware of which can process
POCO object graphs that are detached.

I did find an interesting solution called GraphDiff which is available from github and also as
a NuGet package, but it didn’t work with the latest RC version of the Entity
Framework (v6).

I also found a very comprehensive article on how to implement a generic
repository pattern with the Entity Framework, but it was unable to handle
detached many-to-many relationships. In any case, I highly recommend a
read of this article, it was inspiration for some of the approach I’ve ended
up taking with my own design.

The Approach

This morning I put together a simple data model with the relationships that I
wanted to support with detached entities. I’ve attached the solution with
a sample schema and test data at the bottom of this article. If you prefer
to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet,
I’ve omitted it for file size and licensing reasons).

Here’s a logical view of the model I wanted to support:

Here’s the schema view from SQL Server:

Here’s the Entity Model which is generated from the above SQL schema:

In the spirit of punching myself in the head, I’ve elected to have one table
implement an identity specification (meaning the underlying schema allocated PK
ID values) whereas the other two tables the ID must be specified.

Theoretically, if I can handle the entity types in a generic fashion, then
this solution can scale out to larger and more complex models.

The scenarios I’m specifically looking to solve in this solution with
detached object graphs are as follows:

Add a relationship (many-to-many)

Add a relationship (FK-based)

Update a related entity (many-to-many)

Update a related entity (FK-based)

Remove a relationship (many-to-many)

Remove a relationship (FK-based)

Per the above, here’s the scenarios within the context of the above data
model:

Add a new Secondary entity to a Primary entity

Add an Other entity to a Secondary entity

Update a Secondary entity by updating
a Primary entity

Update an Other entity from a Secondary entity
(or Primary entity)

Remove (but not delete!) a Secondary entity from
a Primary entity

Remove (but not delete) a Other entity from
a Secondary entity

Establishing Test Data

Just to give myself a baseline, the data model is populated (by default) with
the following data. This gives us some “existing entities” to query and
modify.

More work for the consumer

Although I tried my best, I couldn’t come to a design which didn’t
require the consuming client to do slightly more work to enable this to work
properly. Unfortunately the best place for change tracking to occur
with disconnected entities is with the layer making changes – be it a business
layer or something downstream.

To this effect, entities will need to implement a property which reflects the
state of the entity (added, modified, deleted etc.). For the object graph
to be updated/managed successfully, the consumer of the entities needs to set
the entity state properly. This isn’t at all as bad as it sounds, but it’s
not nothing.

Establishing some Scaffolding

After generating the data model, the first thing to be done is ensure each
entity derives from the same base class. (“EntityBase”) this is used later
to establish the active state of an entity when it needs to be processed.
I’ve also created an enum (“ObjectState”) which is a property of the base class
and a helper function which maps ObjectState to an
EF EntityState. In case this isn’t clear, here’s a class view:

Constructing Data Access

To ensure that the usage is consistent, I’ve defined a single Data Access
class, mainly to establish the pattern for handling detached object
graphs. I can’t stress enough that this is not
intended as a guide to an appropriate way to structure your data access – I’ll
be updating my ongoing series of articles to go into more detail – this
is only to articulate a design approach to handling detached object
graphs.

Having said all that, here’s a look at my “DataAccessor” class, which can be
used with generic data access entities (by way of generics):

As with my ongoing project, the Entity Framework DbContext is instantiated by
this class on construction, and implements IDisposable to ensure the DbContext
is disposed properly upon construction. Here’s the constructor showing the
EF configuration options I’m using:

view
source

print ?

1.public DataAccessor()

2.{

3._accessor = new
SampleEntities();

4._accessor.Configuration.LazyLoadingEnabled = false;

5._accessor.Configuration.ProxyCreationEnabled = false;

6.}

Updating an Entity

We start with a basic scenario to ensure that the scaffolding has been
implemented properly. The scenario is to query for
a Primary entity and then change a property and update the entity in
the data store.

view
source

print ?

01.[TestMethod]

02.public
void UpdateSingleEntity()

03.{

04.Primary existing = null;

05.String existingValue = String.Empty;

06.

07.

08.

09.

10.using
(DataAccessor a = new DataAccessor())

11.{

12.existing = a.DataContext.Primaries.Include("Secondaries").First();

13.Assert.IsNotNull(existing);

14.existingValue = existing.Title;

15.existing.Title = "Unit "
+ DateTime.Now.ToString("MMdd hh:mm:ss");

16.}

17.using
(DataAccessor b = new DataAccessor())

18.{

19.existing.State = ObjectState.Modified;

20.b.InsertOrUpdate<Primary>(existing);

21.}

22.using
(DataAccessor c = new DataAccessor())

23.{

24.existing.Title = existingValue;

25.existing.State = ObjectState.Modified;

26.c.InsertOrUpdate<Primary>(existing);

27.}

28.}

You’ll noticed that there is nothing particularly significant here, except
that the object’s State is reset toModified between
operations.

Updating a Many-to-Many Relationship

Now things get interesting. I’m going to query for
a Primary entity, then I’ll update both a property of
thePrimary entity itself, and a property of one of the entity’s
relationships.

view
source

print ?

01.[TestMethod]

02.public
void
UpdateManyToMany()

03.{

04.Primary existing = null;

05.Secondary other = null;

06.String existingValue = String.Empty;

07.String existingOtherValue = String.Empty;

08.

09.

10.

11.

12.using
(DataAccessor a = new DataAccessor())

13.{

14.//Note that we include the navigation property in the query

15.existing = a.DataContext.Primaries.Include("Secondaries").First();

16.Assert.IsTrue(existing.Secondaries.Count() > 1,

17."Should be at least 1 linked item");

18.}

19.//save the original description

20.existingValue = existing.Description;

21.//set a new dummy value (with a date/time so we can see it working)

22.existing.Description = "Edit "

23.

24.

25.

26.

27.+ DateTime.Now.ToString("yyyyMMdd hh:mm:ss");

28.existing.State = ObjectState.Modified;

29.

30.

31.

32.

33.other = existing.Secondaries.First();

34.//save the original value

35.existingOtherValue = other.AlternateDescription;

36.//set a new value

37.other.AlternateDescription = "Edit "

38.+ DateTime.Now.ToString("yyyyMMdd hh:mm:ss");

39.other.State = ObjectState.Modified;

40.

41.

42.

43.

44.//a new data access class (new DbContext)

45.using
(DataAccessor b = new DataAccessor())

46.{

47.//single method to handle inserts and updates

48.

49.

50.

51.

52.//set a breakpoint here to see the result in the DB

53.b.InsertOrUpdate<Primary>(existing);

54.}

55.

56.

57.

58.

59.//return the values to the original ones

60.existing.Description = existingValue;

61.other.AlternateDescription = existingOtherValue;

62.existing.State = ObjectState.Modified;

63.other.State = ObjectState.Modified;

64.

65.

66.

67.

68.using
(DataAccessor c = new DataAccessor())

69.{

70.//update the entities back to normal

71.//set a breakpoint here to see the data before it reverts back

72.c.InsertOrUpdate<Primary>(existing);

73.}

74.}

If we actually run this unit test and set the breakpoints accordingly, you’ll
see the following in the database:

Database at Breakpoint #1 / Database at Breakpoint #2

Database when Unit Test completes

You’ll notice at the second breakpoint that the description of the first
entities have both been updated.

Examining the Insert/Update code

The function exposed by the “data access” class really just passes through to
another private function which does the heavy lifting. This is mainly in
case we need to reuse the logic, since it essentially processes state action on
attached entities.

view
source

print ?

1.public void InsertOrUpdate<T>(params
T[] entities) where T : EntityBase

2.{

3.ApplyStateChanges(entities);

4.DataContext.SaveChanges();

5.}

Here’s the definition of the ApplyStateChanges function, which I’ll
discuss below:

view
source

print ?

01.private
void ApplyStateChanges<T>(params
T[] items) where T : EntityBase

02.{

03.DbSet<T> dbSet = DataContext.Set<T>();

04.foreach
(T item in items)

05.{

06.//loads related entities into the current context

07.dbSet.Attach(item);

08.if
(item.State == ObjectState.Added ||

09.

10.

11.

12.

13.item.State == ObjectState.Modified)

14.{

15.dbSet.AddOrUpdate(item);

16.}

17.else
if (item.State == ObjectState.Deleted)

18.{

19.dbSet.Remove(item);

20.}

21.foreach
(DbEntityEntry<EntityBase> entry in

22.DataContext.ChangeTracker.Entries<EntityBase>()

23..Where(c => c.Entity.State != ObjectState.Processed

24.&& c.Entity.State != ObjectState.Unchanged))

25.{

26.var y = DataContext.Entry(entry.Entity);

27.y.State = HelperFunctions.ConvertState(entry.Entity.State);

28.entry.Entity.State = ObjectState.Processed;

29.}

30.}

31.}

Notes on this implementation

What this function does is to iterate through the items to be examined,
attach them to the current Data Context (which also attaches their children),
act on each item accordingly (add/update/remove) and then process new entities
which have been added to the Data Context’s change tracker.

For each newly “discovered” entity (and ignoring entities which are unchanged
or have already been examined), each entity’s DbEntityEntry is set
according to the entity’s ObjectState (which is set by the calling
client). Doing this allows the Entity Framework to understand what actions
it needs to perform on the entities when SaveChanges() is invoked
later.

You’ll also note that I set the entity’s state to “Processed” when it has
been examined, so we don’t act on it more than once (for performance
purposes).

Fun note: the AddOrUpdate extension method is something I found in
theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’
operation, inserting or updating entities depending on whether they exist or not
already. Bonus!

That’s it for adding and updating, believe it or not.

Corresponding Unit Test

The following unit test establishes the creation of a new many-to-many
entity, it is then removed (by relationship) and then finally deleted altogether
from the database:

view
source

print ?

01.[TestMethod]

02.public
void AddRemoveRelationship()

03.{

04.Primary existing = null;

05.

06.

07.

08.

09.using
(DataAccessor a = new DataAccessor())

10.{

11.existing = a.DataContext.Primaries.Include("Secondaries")

12..FirstOrDefault();

13.Assert.IsNotNull(existing);

14.}

15.

16.

17.

18.

19.Secondary newEntity = new
Secondary();

20.newEntity.State = ObjectState.Added;

21.newEntity.AlternateTitle = "Unit";

22.newEntity.AlternateDescription = "Test";

23.newEntity.SecondaryId = 1000;

24.

25.

26.

27.

28.existing.Secondaries.Add(newEntity);

29.

30.

31.

32.

33.using
(DataAccessor a = new DataAccessor())

34.{

35.//breakpoint #1 here

36.a.InsertOrUpdate<Primary>(existing);

37.}

38.

39.

40.

41.

42.newEntity.State = ObjectState.Unchanged;

43.existing.State = ObjectState.Modified;

44.

45.

46.

47.

48.using
(DataAccessor b = new DataAccessor())

49.{

50.//breakpoint #2 here

51.b.RemoveEntities<Primary, Secondary>(existing,

52.x => x.Secondaries, newEntity);

53.}

54.

55.

56.

57.

58.using
(DataAccessor c = new DataAccessor())

59.{

60.//breakpoint #3 here

61.c.Delete<Secondary>(newEntity);

62.}

63.}

Test Results:

Pre-Test – Breakpoint #1 / Breakpoint #2

Breakpoint #3 / Post execution (new entity deleted)

SQL Profile Trace

Removing a many-to-many Relationship

Now this is where it gets tricky. I’d like to have something a little
more polished, but the best I have come up with to date is a separate operation
on the data provider which exposes functionality akin to “remove
relationship”.

The fundamental problem with how the EF POCO entities work without any
modifications, is when they are detached, to remove a many-to-many
relationship, the relationship to be removed is physically removed
from the collection.

When the object graph is sent back for processing, there’s a missing related
entity, and the service or data context would have to make an assumption that
the omission was on purpose, not to mention that it would have to compare
against data currently in the data store.

To make this easier, I’ve implemented a function called “RemoveEnttiies”
which alters the relationship between the parent and the child/children.
The one bug catch is that you need to specify the navigation property or
collection, which might make it slightly undesirable to implement
generically. In any case, I’ve provided two options – with the navigation
property as a string parameter or as a LINQ expression – they both do the same
thing.

view
source

print ?

01.public
void RemoveEntities<T, T2>(T parent,

02.Expression<Func<T, object>> expression, params
T2[] children)

03.where T : EntityBase

04.where T2 : EntityBase

05.{

06.DataContext.Set<T>().Attach(parent);

07.ObjectContext obj = DataContext.ToObjectContext();

08.foreach
(T2 child in children)

09.{

10.DataContext.Set<T2>().Attach(child);

11.obj.ObjectStateManager.ChangeRelationshipState(parent,

12.child, expression, EntityState.Deleted);

13.}

14.DataContext.SaveChanges();

15.}

Notes on this implementation

The “ToObjectContext” is an extension method, and is akin
to (DataContext as IObjectContextAdapter).ObjectContext. This is to
expose a more fundamental part of the Entity Framework’s object model. We
need this level of access to get to the functionality which controls
relationships.

For each child to be removed (note: not deleted from the physical database),
we nominate the parent object, the child, the navigation property (collection)
and the nature of the relationship change (delete).

Note that this will NOT WORK for Foreign Key
defined relationships – more on that below.

To delete entities which have active relationships, you’ll need to drop the
relationship before attempting to delete or else you’ll have data
integrity/referential integrity errors, unless you have accounted for cascading
deletion (which I haven’t).

Example execution:

view
source

print ?

1.using (DataAccessor c = new DataAccessor())

2.{

3.//c.RemoveEntities<Primary, Secondary>(existing, "Secondaries", s);

4.//(or can use an expression):

5.c.RemoveEntities<Primary, Secondary>(existing, x => x.Secondaries, s);

6.}

Removing FK Relationships

As mentioned above, you can’t just edit the relationship to remove an
FK-based relationship. Instead, you have to follow the EF practice
of setting the FK entity to NULL. Here’s a Unit Test which
demonstrates how this is achieved:

view
source

print ?

01.Secondary s = ExistingEntity();

02.using
(DataAccessor c = new DataAccessor())

03.{

04.

05.

06.

07.

08.s.Other = null;

09.s.OtherId = null;

10.s.State = ObjectState.Modified;

11.o.State = ObjectState.Unchanged;

12.c.InsertOrUpdate<Secondary>(s);

13.}

We use the same “Insert or Update’ call – being aware that you have to set
the ObjectState properties accordingly.

Note: I’m in the process of testing the reverse removal – i.e. what
happens if you want to remove a Secondaryentity from
an Other entity’s collection.

Deleting Entities

This is fairly straightforward, but I’ve taken a few more precautions to
ensure that the entity to be deleted is valid no the server side.

view
source

print ?

01.public
void Delete<T>(params T[] entities) where T : EntityBase

02.{

03.foreach
(T entity in entities)

04.{

05.T attachedEntity = Exists<T>(entity);

06.

07.

08.

09.

10.if
(attachedEntity != null)

11.{

12.var attachedEntry = DataContext.Entry(attachedEntity);

13.attachedEntry.State = EntityState.Deleted;

14.}

15.}

16.DataContext.SaveChanges();

17.}

To understand the above, you should take a look at the implementation of the
“Exists” function which essentially checks the data store and local cache to see
if there is an attached representation:

view
source

print ?

01.protected
T Exists<T>(T entity) where T : EntityBase

02.{

03.var objContext = ((IObjectContextAdapter)this.DataContext)

04..ObjectContext;

05.var objSet = objContext.CreateObjectSet<T>();

06.var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name,

07.entity);

08.

09.

10.

11.

12.DbSet<T> set
= DataContext.Set<T>();

13.var keys = (from x in
entityKey.EntityKeyValues

14.select x.Value).ToArray();

15.

16.

17.

18.

19.//Remember, there can by surrogate keys, so don‘t assume there‘s

20.//just one column/one value

21.//If a surrogate key isn‘t ordered properly, the Set<T>().Find()

22.//method will fail, use attributes on the entity to determine the

23.//proper order.

24.

25.

26.

27.

28.//context.Configuration.AutoDetectChangesEnabled = false;

29.

30.

31.

32.

33.return
set.Find(keys);

34.}

This is a fairly expensive operation which is why it’s pretty much reserved
for deletes and not more frequent operations. It essentially determines
the target entity’s primary key and then checks whether the entity exists or
not.

Note: I haven’t tested this on entities with surrogate keys, but I’ll get
to it at some point. If you have surrogate key tables, you can define the
PK key order using attributes on the model entity, but I haven’t done this
(yet).

Summary

This article is the culmination of about two days of heavy analysis and
investigation. I’ve got a whole lot more to contribute on this topic, but
for now, I felt it was worthy enough to post as-is. What you’ve got here
is still incredibly rough, and I haven’t done nearly enough testing.

To be honest, I was quite excited by the initial results, which is why I
decided to write this post. there’s an incredibly good chance that I’ve
missed something in the design and implementation, so please be aware of
that. I’ll be continuing to refine this approach in my main series of
articles with much cleaner implementation.

In the meantime though, if any of this helps anyone out there struggling with
detached entities, I hope it helps. There’s precious few articles and
samples that are up to date, and very few that seem to work. This is
provided without any warranty of any kind!

If you find any issues please e-mail me [email protected] and
I’ll attempt to refactor/debug and find ways around some of the inherent
limitations. In the meantime, there are a few helpful links I’ve come
across in my travels on the WWW. See below.

Example Solution Files [ Files ]

Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I
haven’t included it in the archive.