In the previous part we talked a bit about tools and surface mechanisms, this time we’ll talk about the serialization of diagrams, which is a huge topic on its own and a complicated one at times too. You probably know that serialization is related to copy/paste, saving things to files, the property grid, drag-drop… In fact serialization is pretty much everywhere: WCF services, Windows workflows, P2P communication and more. The .Net frameworks gives you a wealth of mechanisms to help you serialize things with various levels of sophistication: attributes (Serializable, DataContract,…), interfaces (ISerializable, IXmlSerializable…) and formatters (binary, soap, xml…) together with whole namespaces full of utility classes. In the next few paragraphs I’ll review first some basics of serialization, explain the internal structure of our (MVC) Model, highlight the problems you encounter when trying to serialize a diagram and, finally, explain the (non-standard) serialization system of GraphSquare.
The easiest way to serialize a class in .Net is by decorating a class with the System.Serializable attribute. It tells the CLR to pick up the public and private fields and to save them as part of the data necessary to reconstruct an instance. In case you want to drop a few fields in this process you can use the NonSerializedAttribute attribute which will discard the field decorated with this attribute.Note that the NonSerialized attribute cannot be applied on a property but only on a field, the reason escapes me however.
1 2 3 4 5 6 7 8 9 10 11 |
[Serializable] public class BlackBox { [NonSerialized] public object ThisWillNotBeSerialized; public string FirstName { get; set; } public string LastName { get; set; } public int Weight { get; set; } } |
If the two attributes above do not give you enough control over the serialization of your class you can go for the handcrafted serialization using the System.Runtime.Serialization.ISerializable interface. This interface has a deceptively simple signature which only requires you to hadn over those objects you want to serialize. The method passes a serialization info object which acts as a string-object dictionary in which you can store whatever you wish, even things non-related to the fields of you class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
[Serializable] internal class ControlledBlackBox : ISerializable { public object ThisWillNotBeSerialized; public string FirstName { get; set; } public string LastName { get; set; } public int Weight { get; set; } public void GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("FirstName", this.FirstName); info.AddValue("LastName", this.LastName); info.AddValue("InFactAnyNameYouWish", this.Weight); //non-field related data info.AddValue("SomethingYouWantToSave", 3.141592); } protected ControlledBlackBox(SerializationInfo info, StreamingContext context) { FirstName = info.GetString("FirstName"); LastName = info.GetString("LastName"); Weight = info.GetInt32("InFactAnyNameYouWish"); double pi = info.GetDouble("SomethingYouWantToSave"); } } |
The interface does not tell you that in order to function well this requires a constructor with the same signature wherein you can pick up the stuff you saved and re-initialize the class with it. Actually, you can store whatever you wish in the serialization info, it does not need to be related to fields of you class. Often you need some ambient information to correctly instantiate an object and this is how you can store additonal data. The AddValue has many overloads and at times it can be difficult to use the correct one (try to (de)serialize an Enum to see what I mean). It’s also through this AddValue method that you actually chain a serialization process. As soon as you use custom objects as data types in your fields you tell the CLR to serialize this data type as well. The CLR will automatically create internally a graph of those objects to be serialized and by wanting to serialize one class in easily slip in having to serialize a whole bunch of classes in your project.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[Serializable] internal class ControlledBlackBox : ISerializable { public CustomData ChainedDataMember { get; set; } public void GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("ChainedDataMember", this.ChainedDataMember); } protected ControlledBlackBox(SerializationInfo info, StreamingContext context) { ChainedDataMember = info.GetValue("ChainedDataMember", typeof(CustomData)) as CustomData; } } [Serializable] internal class CustomData { public string SomeMoreHere { get; set; } } |
During serialization the CLR will traverse the serialization graph from top to bottom and you can control everyting using the GetObject method. During deserialization you have an additional interface at your disposal which will be called after all classes in the internal graph have been deserialized. The CLR will call the System.Runtime.Serialization.IDeserializationCallback if implemented in reverse order and it allows you to do some post-deserialization which can be necessary when objects are inter-related.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[Serializable] internal class SerializationWithCallback : ISerializable, IDeserializationCallback { public void OnDeserialization(object sender) { //some post-serialization code here } public void GetObjectData(SerializationInfo info, StreamingContext context) { } } |
On top of all this you can complement you (de)serialization process with pre & post (de)serialization methods which will be called directly before respectively after the (de)serialization is applied to your class. There is not conflict with the callback described above since the callback is applied after all classes have been deserialized while the OnDeserialized method is called directly after the class was deserialized. Yet, I admit, it can be quite compelx sometimes to trace down some bugs and to understand what the CLR is doing when it starts to switch between classes and calling recursively all kinds of methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
[Serializable] internal class SerializationWithCallbackAndPrePost : ISerializable, IDeserializationCallback { public void OnDeserialization(object sender) { //some post-serialization code here } public void GetObjectData(SerializationInfo info, StreamingContext context) { } [OnDeserialized] public void OnDeserialized() { //called just after the class was deserialized } [OnDeserializing] public void OnDeserializing() { //called just before the CLR enters the deserialization of this class } [OnSerialized] public void OnSerialized() { //called just after the CLR serialized this class } [OnSerializing] public void OnSerializing() { //called just before the CLR enters the serialization of this class } } |
Be aware that all this goodness gives you a lot of power but as with every big power, you need to understand in details what is going on in order to use it effectively or you easily end up with a spaghetti of ‘goto’ effects in you code.
Besides the binary serialization we have described above you have also an whole set of possibilities to (de)serialize things to XML (and SOAP in particular). From my point of view this whole machinery has become somewhat obsolete since the introduction of LINQ to XML where I think things are much easier to manipulate, but we’ll quickly review things here.
Here again you can either rely on a black-box XML serialization by simply passing you type to the XmlSerializer method or you can fine-tune the process by means of attributes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
[XmlRoot("RootName")] internal class SimpleXmlSerialization { [XmlElement("TheFirstName")] public string FirstName { get; set; } [XmlIgnore] public NameValueCollection Attributes { get { return _attributes; } set { _attributes = value; } } } internal class YourMainCode { public void SerializeToXml() { SimpleXmlSerialization obj = new SimpleXmlSerialization {FirstName = "Just me"}; XmlSerializer serializer = new XmlSerializer(obj.GetType()); Stream stream = new FileStream("c:\\MyFile.xml", FileMode.Create, FileAccess.Write, FileShare.None); serializer.Serialize(stream, obj); stream.Close(); } } |
To gain more control on this process you can implement the IXmlSerializable interface which allows you to actually control completely what will be written to disk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[Serializable] internal class XMLControlledBox : IXmlSerializable { public System.Xml.Schema.XmlSchema GetSchema() { } public void ReadXml(System.Xml.XmlReader reader) { } public void WriteXml(System.Xml.XmlWriter writer) { } } |
It’s precisely in this context (in the WriteXml and ReadXml) that I believe you can actually nowadays better rely on the LINQ to XML than to fiddle with the System.Xml.XmlWriter class.
You will also find in the System.Runtime.Serialization.Formatters namespace (you’ll need to reference the associated equally-named assembly in order to access it) the SOAP formatter which allows you to (de)serialize things using a SOAP envelope which can be used in peculiar (webservice related) situations.
In the context of WPF and somewhat related to XML you can also find in the System.Windows.Markup namespace the XAMLWriter class which exports any WPF construct to XAML markup. We will come back to this below.
In general, serializing simple things is simple and serializing complicated things is, well, complicated. In most cases you can get away with the black-box serialization by simply decorating a class with the [Serializable] attribute, which is based on the properties and the default constructor. This works well for standard data types and not too complicated hierarchies of classes. Indeed, as soon as you decorate a class with the serialization attribute the CLR will create internally a graph of objects which has to be serialized. If you have a complex inheritance system and many custom data types you quicly get a huge set of classes you are forced to decorate too and eventually end up with serialization loops: two classes referencing each other by means of properties will create loops in the serialization graph and you will get exceptions at runtime. Not at compile time but at runtime. Even with all the pre/post attributes and callbacks at your disposal things easily get horrendously difficult to trace when your Visual Studio debugger starts to jump between (de)serialization methods, pre-post methods and callbacks.
The custom serialization gives you a grip on the serialization process on the class level but not on the hierarchy level. By this I mean that you can control how and what will be (de)serialized on a class level but you have no control over the order the CLR accesses the various classes in your object hierarchy. In order to explain this more in details I’ll have to explain first the way that one usually starts a serialization process.
If you want to serialize some stuff to binary format you’ll scribble something down like the following:
1 2 3 4 5 6 7 8 |
internal override void Serialize(DocumentWrapper wrapper) { BinaryFormatter binFormat = new BinaryFormatter(); using (Stream fStream = new FileStream(FileName, FileMode.Create, FileAccess.Write, FileShare.None)) { binFormat.Serialize(fStream, wrapper); } } |
where the ‘wrapper’ is the stuff you want to save to disk. The serialize method of the binary formatter will first create and object graph on the basis of the wrapper object by looking at either the properties (in the case of black-box serialization) or by recursively calling the ISerializable method in the order that you added object to the serialization info dictionary. In general this works well but not for the serialization of diagrams. The problem being the fact that you cannot serialize bindings of objects and this is precisely the case with diagram connections between diagram nodes. The issue is rather technical but it all amounts to two different tensions: on the one hand the order of things to serialize and on the other the inter-dependency of objects in a diagram. The situation is really rather unusual in the sense that most line of business applications have a tree-like data dependency while diagram data is inherently graph-like (non-linear, that is). The situation is aggravated by the inverse problem when you deserialize a diagram, even with all these techniques described above you will discover that deserialization of diagrams is a complicated affair.
If you present something like a ‘shape’ in a diagram you inevitably have somewhere a class which inherits from a WPF base class like the ContentControl. This makes your Model WPF dependent but worse, it brings along a whole bunch of stuff you have no control over. Imagine you wish to re-use the Model underneath a WCF service and create diagrams outside the WPF surface, you will be forced to reference WPF assemblies where they don’t make much sense. In addition, you will run into troubles when having to rely on style and template resources (XAML) which will not be actually applied unless you internally create a surface (something like a Canvas).
Consider also the need for importing diagrams from standard XML, say you want to display any type of XML in a tree diagram. Do you actually want to expose the internals of your Model to the outside world? One of the ideas I had while creating the whole architecture was to be able to have a provider pattern underneath the diagram repository. Because one cannot envisage all possible data scenarios it’s a good idea to create an open end where others can plug into. For example, how to enable one to create a data connection to a custom database? That is, storing diagrammatic information in some database and, conversely, importing data from external systems. What about creating mashups with webservices like Facebook or MySpace?
Last but not least, one of the most powerful aspects of WPF is data binding. However, data binding cannot be (de)serialized in a snap, which both handicaps the standard mechanisms and also requires extra efforts from your side.
Deserialization on the basis of the standard .Net mechanisms doubles your code which relates to your model. On the one hand you have inside the command units (the classes which are part of the undo-redo system) some code which accesses methods of your controller or model, things like AddShape, AddPage and so on. On the other hand, the deserialization is constructing shapes and pages without making use of these methods. Indeed, it would make little sense to call an AddShape method inside the deserialization constructor since this would create a recusive stack. So, if you change something in your AddShape method you have to make sure that this addition is also reflected in your deserialization and vice versa. Note also that calling the undo-redo commands in your deserialization is not an option. Opening and saving stuff is never an undo-able action.
Wouldn’t it be better to have a centralized diagram methods which are used whether the origin is deserialization, import, originating from a webservice call or a user interactions on the surface?
After many experiments, trials and frustration it became obvious (to me) that instead of relying on the internal process of the CLR it was necessary to create a custom serialization which would
In the next few paragraphs I’ll highlight how the (de)serialization was constructed in GraphSquare but I’d like to emphasize again that you should carefully consider your options in your own project because in general you don’t need the high-tech stuff we have implemented. I think as a data structure a diagram is rather peculiar and, in fact, the most generic type of data hierarchy you can imagine: blobs of data bound in an arbitrary way to each other. Usually you’ll find in your own to-be-serialized data structures some kind of symmetry or pattern which allows you to simplify the serialization considerably.
Before continuing, let me mention or recall the following:
Before explaining the actual serialization code let me highlight the internal structure of the model. The MVC-model is actually called the ‘Document’ and can be considered as the database of the diagram. The Document contains all the info and its hierarchy reflects the way that the diagram control presents itself to the user. For example, the layers are sub-ordinate to the page and shapes are part of a layer.

The Document is the root of the hierarchy is in fact the same as the Model in the MVC pattern. It’s the data repository of the whole system and contains all members which alters the diagram and the control. The surface members are aliases to the Controller which delegates things down to the Document. The alterations are propagated back to the surface through events which are picked up by the controller and sub-sequentially by the surface.
The metadata is a small class wherein things like author and creation date are recorded.
The page collection contains all the pages of the diagram control and each page contains info like background color, layout type, page title and so on.
The layer collection contains the layers per page. Some page types do not accept multiple layers, like the mindmapping type where a layer is conceptually similar to collapsing branches.
A layer contains shapes and connections as separate collection even though each connector in a shape has the information about the connections. One could put the connection collection hierarchically underneath the connector but in practice it makes life easier to have it like this in function of (de)serialization.
An important note here is that the binding between shape and connector is loose in the sense that the connector collection is not part of the shape definition but defined in the template applied to the shape. This makes things more complicated for the (de)serialization but allows a certain flexibility in defining and customizing shapes.
The data abstraction layer consist of surrogate classes which can be compared with data entities in multi-tiered database application. They encapsulate the essence of a diagram entity (connector, shape, page…) without any reference to WPF or anything. Obviously you need to store in them a reference to the name of a template or style if you wish to show them on a UI/WPF surface but this does not enforce anything. In fact, using our abstraction layer one can create diagrams in totally non-WPF environments like underneath a WCF service, using database inserts or through standard WinForms (propertygrid). Unity is also playing a big role in this mechanism, because at some point you need to convert a surrogate entity to an actual Document entity. The decoupling allows one to perform this conversion inside a Unity container and in this way to achieve a certain independence, that is the container injects some objects which are allowed to be assigned on the basis of a common interface.
For each concrete diagram class one has a surrogate class which we prefixed with ‘Pre’ (what’s in a name obviously?) and they are bound to each other by means of reciprocal methods called GetShape and Wrap. When a data exchange process is started the surrgate DocumentWrapper will be handed over and it allows a data adpater to serialize it in some way. Conversely, when a data adapter loads serialized data it hands over a DocumentWrapper to the core, i.e. the Document. In this sense, the data abstraction layer consists of a single class, the DocumentWrapper, through which all data exchange occurs. From the point of view of the Document the origin of the DocumentWrapper is unknown. From the point of view of the external provider the DocumentWrapper is unrelated to WPF and the Document. As much as data entities in a N-tier database application form the intermediate step to business entities, so are the surrogate entities intermediate to the actual shapes in the Document (Model).
When the controller starts an exchange process it needs to provide a data adapter and all the rest occurs inside the adapter. The adapter accesses the DocumentWrapper and turns whatever it wishes into whatever format. For example, the XML adapter uses LINQ to XML to turn all the surrogate classes into XML elements. Conversely, we have a generic XML adapter which takes any incoming XML and turns it into a default DocumentWrapper. The adapter (or provider model if you prefer) is generic and can be implement externally. In order to add it to the control we make use of the Unity application block.
Coming back to the problems mentioned above, in the process of building the DocumentWrapper we implement our own to-be-serialized graph of surrogate classes. While this might sound complicated it really amounts just to some for-each loops on the pages, layers and other collections. Conversely, in order to convert a DocumentWrapper to an actual Document we control every step whereby we make use of the AddShape, AddPagee…methods of the Document to build up the diagram. This then is probably the crucial point; we don’t use serialization constructor but rather the standard Document methods to deserialize a diagram. In this sense, the whole logic remains centralized in these methods: data binding, event binding and so on. In order not to cause an avalange of events if the diagram control is embedded we have a global event-blocking property, but this can be switched off if necessary.
Obviously, the description here is rather schematic and a whole lot of code (over 2000 lines in fact) is required to make all this really functional. The details will not be published or released but I’d hope that this article clears the sky a bit for those who are looking for a robust serialization architecture.
In the next part we’ll have a look at something more fun and simple; WPF styling, triggers, animation and such.
We re-invented Live Pivot and took away the limitations along the way.
Read more
This is awesome stuff. Thanks for sharing all your insights. I was a fan of Netron way back when. Now I’m moving into the WPF realm and find it a bit confusing at times. I was wondering if the source to G2 is available somewhere? I was reading on you “how” page that the source to most things are on an SVN repository but didn’t specify where, and the G2Demo is a binary. Anyway, thanks again for sharing a wealth of knowledge.
By Shawn B. April 16, 2009 - 11:46 pm