History tracking

Chances are that if you design a system you also have the requirement of tracking data changes, also known as history tracking or audit trailing. The mechanism of capturing data changes can be achieved in a number of ways, each one of them having its own advantages and disadvantages. This article is a summary of my research on this topic.

For the purpose of this article, I will be using a simplified BlogEntry table, with the following columns:

h1

Only tracking last changes

Sometimes it’s enough to just track the creation and modification information of data, instead of the full historical information. In that case, you can simply add columns LastModifiedBy, LastModificationDate, CreatedBy and CreationDate to all of your your tables and make sure they are filled in correctly when a new row is added or modified. For deletions, in most cases you don’t want to actually delete the row in your database, but change its state to inactive (soft delete).

Such table could look like:

h2

Full tracking using history tables

Of course, if you want to track the actual data changes and keep a full history of revisions that happened, we need to do more. In this case we want to track:

  • The information that was changed.
  • The date on which the information was changed.
  • The user that changed the information.

We have two approaches here:

  • Shadow history approach: the history tracking information is stored in a referenced table, which can be:
    • A separate history table for each table being audited.
    • A consolidated history table for all tables being audited.
  • Live history approach: the history information is stored in the same data table as the business data.

Shadow history

A separate history table for each table being audited

A history table can be a clone of the original table with additional columns to track what user did the change and when he did it:

h3

With every change that happens, a new entry that describes the change (insert, update, delete) is added to the history table; and the actual row in the original table is modified. There are no constraints in the history table.

Advantages:

  • Good performance: you can query fast live and historical data.
  • Cleaning up historical data is easy – just delete rows older than a specific date.

Disadvantages:

  • You end up with a lot if duplicate data without registering what actually did change; so you would have to compare data to know what fields exactly changed.
  • It potentially could double the amount of tables (bad for maintenance).
  • The structure of this table is tied to the original table, so every change in the original table structure results in a change in the history table structure.
  • Any change in a table results in a change in its history table.
  • Entities and their historical counter parts are different, so different logic has to be implemented.

This would be a good solution in case you want to be able to quickly access the real data, but not if working with historical data is an important part of your application.

A consolidated history table for all tables being audited

You can add one generic history table using the Amorphous Table technique as follows:

h4

With every change that happens, a new entry that describes the change is added to the generic history table; and the actual row in the original table is modified. There are no constraints in the history table.

Advantages:

  • You only have one extra table that holds history information of all of your tables, so less work.
  • Also, it separates concerns better: it captures what changed, nothing more and nothing less.
  • It takes less space due to the fact that it stored partial history instead of the entire structure.
  • Cleaning up historical data is easy – just delete rows older than a specific date.

Disadvantages:

  • It’s harder to reconstitute a record to a given date (although once you have it, the same logic can be used for all tables).
  • The values stored must be of a generic type, so there is casting involved.
  • Entities and their historical counter parts are different, so different logic should be implemented.

This would be a good solution in case you want to be able to access the real data quickly and easy, but not if working with historical data is an important part of your application.

Live history

In this approach the history information is stored in the same data table as the data itself. If data is modified or deleted, the original record is not updated; instead a new record is created that holds the latest information. The new (most recent) record becomes the live record (active), while all previous records form the history.

h5

Advantages:

  • No new database tables have to be created.
  • Entities and their historical counter parts are the same, so the same logic can be used.

Disadvantages:

  • You mix historical data with live data, which can make queries more complex.
  • Tables may become very big, so without the right optimization strategy (indexes) performance can be poor.
  • Cleanup of historical data might become complex (especially if they have references to live data).

This would be a good candidate if you are dealing with low data volumes, historical data has a big meaning in the application and is heavily used.

Some principles

Where to put tracking logic

The places we could implement the tracking logic are:

  • In the database itself, using triggers.
  • In the application logic, using code.

Doing this in application logic has the advantage that you have the full business context and you can also audit non-database operations, which may be important as not all data might be database centric. But you have to be sure that your database is not altered directly, you cannot bypass the application layer otherwise there will be no auditing!

Doing this as part of a trigger requires you to also put needed business context in the database, but the advantage is that auditing will always work, even if data is changed directly in the database without using the application layer, like from a different application, from a SQL command line, some script or directly from within SQL Server Management Studio.

So what to choose? As always, it depends… in my opinion, if you are concerned with people bypassing the audit trail by updating the data in the database directly, you may have other (operational) problems that have to be solved first. Auditing is a business requirement, so it belongs to your business part of the application, and therefore not in the database. If you have a business layer, it should not be bypassed, otherwise it makes no sense to have this layer after all.

Rolling back transactions

What about rolling back transactions? In the case of auditing in application logic, if the transaction is rolled back then every historical change is rolled back too. In the case of triggers, every trigger that got executed as part of the transaction is also rolled back. So both mechanisms work correctly when it comes to rolling back transactions.

Rules

Some rules that you may want to apply:

  • Tracking information cannot be modified. From the moment historical data is in the database, it cannot be altered by any component.
  • Aim for maximum performance: use indexes to improve the speed of data retrieval operations.
  • Only use historical data on tables that actually require tracking. Doing this on all tables may be overkill.
  • By default, prefer soft deletes, otherwise you loose historical data.

How to choose?

Before making a decision on what strategy to use in your case, you must analyze your requirements and constraints:

  • Will you be using historical data on a regular basis or will it be used rarely?
  • Is historical data used for reporting purposes, or will it be used to revert data to a previous state?
  • How much data is involved?
  • Should historical data be cleaned regularly?

It’s an exercise that should be done accurately, as it is a fundamental design decision that cannot be changed easily later on.

Patterns to be avoided

This post is meant as a follow-up of my previous series about architectural design.

All design patterns can be good or bad, depending on where and why they are used. Sometimes I see code using patterns which I would never have used myself because they just make everything harder. In that case, I consider them to be anti-patterns; because they are used to solve something that is designed in the wrong way.

This is only my opinion, I’m sure you might think differently. They are based on my experience with business applications.

Service locator

This one is described as an anti-pattern all over the internet. Although I think that service locator is not an anti-pattern in itself, using it in the wrong way for the wrong reasons makes it a very bad practice.

If you use dependency injection (and you probably should), constructor injection is the preferred way to inject dependencies into your objects. The reason is simple: service locator hides your dependencies, which means that you don’t know them when a class gets instantiated and errors might be thrown at runtime. Also maintenance becomes more difficult, because it’s hard to see your dependencies.

Now, in the case of exceptional occasions, service locator might be the only solution to have your dependencies injected. For example, if you can’t use a constructor overload (an attribute, for example); or some legacy component that is not designed for inversion of control, service locator can come to the rescue.

My advice is to always prefer constructor injection; and if you really really really don’t have another choice, use service locator.

Static facades

As defined, “a facade is an object that provides a simplified interface to a larger body of code”. However, static facades are evil, they are a sign of procedural code and often go hand in hand with too many global variables.

In fact, I don’t see any reason to make them static, because it just kills the flexibility of your system. It breaks the dependency injection container, they are very limited in usage and they simply make it difficult to test (it’s hard to mock them out). It does not fit in a system designed for scalability. The only exception I think of are some cross-cutting concerns that are used broadly across the system, like the built-in System.Math class in the .NET framework that contains mathematical utility methods; or when you really need singleton behavior (on rare occasions).

My advice: if you have static facades (or static methods in general), you might want to rethink the design.

Layers with identical entities and auto mapping

In a layered design you have a number of different entity types:

  • Data entities: the entities that represent the database objects, defined in the data access components of the infrastructure layer.
  • Domain entities: entities that contain domain data and logic in the domain layer.
  • Data Transfer Objects: the entities that are exposed by services and used by the clients.

Sometimes I see three layered systems where data entities, domain entities and data transfer objects map one to one; in other words, they are identical. I think this is a bad practice. In fact, you are exposing your data model to your clients, and in fact, because these entities are all the same, why do you need all the layers in the first place? Often this mapping is auto generated by T4 templates, or a framework like object mapper is used to automatically map them; to me this is a sign that something is terribly wrong.

So what would be a better approach?

  • The data entities map one to one to the database tables.
  • The data entities map to the domain entities, but domain entities might be designed differently, so that they really represent your domain (and not the way they are stored in the database).
  • The domain entities map to the data transfer objects, but data transfer objects might look completely different as they are designed in the way that only data is exposed that is really needed by the clients. Your clients don’t need to know about your domain model anyway.

My advice is to design your domain model independent of your database tables, and create data transfer objects that are tailored towards your clients.

Note: if you use Entity Framework code first, data entities are obsolete, because you map database tables directly to your domain entities.

Anemic domain model

This is a well known anti-pattern. Simply put: if your domain entities only contain data and no domain logic, then there is a problem of responsibilities. Probably the domain logic is in another place where it simply does not belong. I’m not going to explain this in detail because there’s more than enough literature to be found on the internet, but unfortunately it still is a common practice even in cases where a fully fledged domain model should be in place.

My advice: if you call your entities ‘domain entities’, it means that you probably need a domain and as such, put domain logic where it belongs: in your domain entities.

Lazy loading

“Lazy Loading is a programming practice in which you only load or initialize an object when you first need it when using an object relational mapper”. I think it should be avoided, because a service should expose clearly defined business operations that make clear in advance what you can expect from them to return (instead of CRUDy operations, unless you really have no business logic). If you do that, there are no surprises and less potential performance problems and side effects; because it’s the service that takes care of the querying. It’s more like controlled eager loading. Of course, you have to design your services in such a smart way that you don’t return too much information, which is the reason you want specialized data transfer objects and clear business operations with limited scope.

My advice: whatever ORM you are using, turn off lazy loading by default.

Hungarian notation

I admit that many many years ago I once named my variables like tbxFirstName, lblAddress and so on. After a while I really started to hate it, and I stopped doing it. It began to look as I was obfuscating my code! Indeed, I consider this to be an anti-pattern because:

  • It complicates naming, because you have to agree on prefixes and eventually you will end up with hundreds of prefixes that are impossible to remember; sometimes more than 15 random characters. Of course, variable names will also become longer. Eventually, it becomes harder to understand.
  • Sometimes types can change, and in that case your variable names have to change too.
  • The purpose is to indicate type of variables, but with intellisense you can also quickly see that.
  • See wikipedia for the full list of reasons.

My advice: name your variables consequently so that they show their intent (make them descriptive), not their type.

Other?

If I think of other patterns that are used in a bad ways, I will add them. If you have other examples of bad practices, feel free to share them!

A reference architecture (part 12)

..continued from part 11

Conclusion

In this series I explained an approach on how to set up a solution. Of course, as this is only a simple case, various aspects can be different in your case.

You probably will end up using other frameworks for dependency injection, logging and validation. You might choose to create a separate query service that uses its own data access component to query its dedicated read-only database with specialized views, or maybe use WCF data services for that matter. Or decide that commands should never return anything, and simplify the request dispatcher and expose a validation operation next to the execute operation to allow consumers to first validate commands before executing them. Or change the fault contract so that it contains technical as well as business and validation errors. You might extend the domain with domain events and use a service bus, or in real complex cases apply CQRS techniques and implement event sourcing.

In any case I think this is a nice basic design to start with!

Where is the source code?

You can download the source code for free here. The only thing I ask is: let me know what you think: suggestions, questions, fixes or any other feedback is most welcome!

A reference architecture (part 11)

…continued from part 10

Logging

If we say logging, we automatically think about log4net, the de facto standard logging framework. So download it, and put the log4net.dll assembly in the \Trunk\Libraries\Log4Net folder. From the ArchitectureExample.Service.WebHost project, add a reference to it.

First, add the needed configuration in the web.config file of the ArchitectureExample.Service.WebHost project:

<?xml version="1.0"?>
<configuration>

<configSections>
<section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" />
</configSections>

<appSettings>
<!-- If you want log4net to add its own diagnostics messages -->
<add key="log4net.Internal.Debug" value="true" />
</appSettings>


<log4net debug="true">
<appender name="RollingLogFileAppender" type="log4net.Appender.RollingFileAppender">
<file value="C:\\log\\architectureexample.txt" />
<appendToFile value="true" />
<rollingStyle value="Size" />
<maxSizeRollBackups value="10" />
<maximumFileSize value="10MB" />
<staticLogFileName value="true" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%-5p %d %5rms %-22.22c{1} %-18.18M - %m%n" />
</layout>
</appender>
<root>
<level value="DEBUG" />
<appender-ref ref="RollingLogFileAppender" />
</root>
</log4net>

</configuration>

Next, we will start using log4net. The default usage says that you should put a static variable on top of each class where you want to log, but I don’t like that approach, so we’ll do it a bit different. As explained in a previous post we have to add a logger interface. So add the following interface to the ArchitectureExample.Infrastructure.Shared project:

public interface ILogger
{
void Debug(string message);
void Debug(string message, Exception ex);
void Info(string message);
void Info(string message, Exception ex);
void Warn(string message);
void Warn(string message, Exception ex);
void Error(string message);
void Error(string message, Exception ex);
void Fatal(string message);
void Fatal(string message, Exception ex);
}

Also add the concrete implementation Log4NetLogger, that uses log4net to do the actual logging:

public class Log4NetLogger : ILogger
{
private readonly ILog _log;

/// <summary>
/// Constructor.
/// </summary>
/// <param name="type"></param>
public Log4NetLogger(Type type)
{
_log = LogManager.GetLogger(type);
}

public void Debug(string message)
{
_log.Debug(message);
}

public void Debug(string message, Exception ex)
{
_log.Debug(message, ex);
}

public void Info(string message)
{
_log.Info(message);
}

public void Info(string message, Exception ex)
{
_log.Info(message, ex);
}

public void Warn(string message)
{
_log.Warn(message);
}

public void Warn(string message, Exception ex)
{
_log.Warn(message, ex);
}

public void Error(string message)
{
_log.Error(message);
}

public void Error(string message, Exception ex)
{
_log.Error(message, ex);
}

public void Fatal(string message)
{
_log.Fatal(message);
}

public void Fatal(string message, Exception ex)
{
_log.Fatal(message, ex);
}
}

In the ArchitectureExample.Service.WebHost project, make sure the log4net is correctly registered by adding this to the StructureMapRegistry:

// Logging
For<ILogger>().AlwaysUnique().Use(s => s.ParentType == null ? new Log4NetLogger(s.BuildStack.Current.ConcreteType) : new Log4NetLogger(s.ParentType));
var applicationPath = Path.GetDirectoryName(Assembly.GetAssembly(GetType()).Location);
var configFile = new FileInfo(Path.Combine(applicationPath, "web.config"));
XmlConfigurator.ConfigureAndWatch(configFile);

Finally, to actually log, it’s just a matter of injection the ILogger into the class where you want to use it, for example, in the AddBlogEntryRequestHandler of the ArchitectureExample.Application.RequestHandlers project:

public class AddBlogEntryRequestHandler : IRequestHandler<AddBlogEntryRequest, AddBlogEntryResponse>
{
private readonly ILogger _logger;
private readonly IUnitOfWork _unitOfWork;
private readonly IBlogEntryRepository _blogEntryRepository;

public AddBlogEntryRequestHandler(ILogger logger, IUnitOfWork unitOfWork, IBlogEntryRepository blogEntryRepository)
{
_logger = logger;
_unitOfWork = unitOfWork;
_blogEntryRepository = blogEntryRepository;
}

public AddBlogEntryResponse Execute(AddBlogEntryRequest request)
{
var blogEntry = new BlogEntry(request.Title, request.Body);
_blogEntryRepository.Add(blogEntry);
_unitOfWork.Commit();
_logger.Debug("AddBlogEntryRequestHandler completed");
return new AddBlogEntryResponse(blogEntry.Id);
}
}

If you now run the AddBlogEntryTest again, you will see that the log file C:\log\architectureexample.txt contains:

DEBUG 2013-03-07 10:54:42,543  2921ms logEntryRequestHandler Debug              - AddBlogEntryRequestHandler completed

You can also easily add logging in the GlobalExceptionHandler like this:

public class GlobalExceptionHandler : IErrorHandler
{
private readonly ILogger _logger;

public GlobalExceptionHandler(ILogger logger)
{
_logger = logger;
}

public void ProvideFault(Exception error, MessageVersion version, ref Message fault)
{
if (error is BusinessException)
{
var businessError = error as BusinessException;
var businessFault = new BusinessFault(businessError.ErrorCode, businessError.Message);
var faultException = new FaultException<BusinessFault>(businessFault, "Oops", new FaultCode("BusinessFault"));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, error.TargetSite.Name);
_logger.Error("BusinessException", businessError);
}
else if (error is ValidationException)
{
var validationError = error as ValidationException;
var businessFault = new BusinessFault("VALIDATIONERROR", validationError.Message);
var faultException = new FaultException<BusinessFault>(businessFault, "Oops", new FaultCode("ValidationFault"));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, error.TargetSite.Name);
_logger.Error("ValidationException", validationError);
}
else
{
var faultException = new FaultException(string.Format("{0}", error.TargetSite.Name));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, faultException.Action);
_logger.Error("Unknown exception", faultException);
}
}

public bool HandleError(Exception error)
{
return true;
}
}

Now run the AddBlogEntryWithoutTitleTest and check whether the ValidationException appears in the log file:

DEBUG 2013-03-07 10:54:42,543  2921ms logEntryRequestHandler Debug              - AddBlogEntryRequestHandler completed
DEBUG 2013-03-07 10:59:54,336 1919ms logEntryRequestHandler Debug - AddBlogEntryRequestHandler completed
DEBUG 2013-03-07 11:00:34,408 41992ms logEntryRequestHandler Debug - AddBlogEntryRequestHandler completed
ERROR 2013-03-07 11:01:08,511 76094ms GlobalExceptionHandler Error - ValidationException
FluentValidation.ValidationException: Validation failed:
-- 'Title' must not be empty.
-- 'Title' should not be empty.
-- 'Title' must be between 1 and 200 characters. You entered 0 characters.
at ArchitectureExample.Service.Shared.HandlerDispatcher.Execute[TRequest,TResponse](TRequest request, Boolean asynchronously) in C:\Projects\R&D\ArchitectureExample\Trunk\Sources\ArchitectureExample.Service.Shared\HandlerDispatcher.cs:line 32
at ArchitectureExample.Service.Adapter.ArchitectureExampleService.AddBlogEntry(AddBlogEntryRequest request) in C:\Projects\R&D\ArchitectureExample\Trunk\Sources\ArchitectureExample.Service.Adapter\ArchitectureExampleService.cs:line 27
at SyncInvokeAddBlogEntry(Object , Object[] , Object[] )
at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage31(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)


Almost time for the conclusion

A reference architecture (part 10)

…continued from part 9

Validation

In the previous part we made it possible to throw business exceptions when something goes wrong in the business logic. However, next to that we also need to do validation, which basically means checking whether properties have valid values.

In the case of validation in request handlers, we can use first check whether the request itself is correct before executing the actual logic. If there is something wrong with the request, like a required property not set, then it makes no sense to continue.

In the case of a domain entity, we can put validation at creation phase, to avoid that we can never create invalid domain entities. As such the best place is in its constructor.

The validation logic itself could be placed in the domain entities and request handlers themselves, but I find it cleaner to separate them out and move that responsibility to dedicated validators. The main concern of entities should be keeping track of business state. Another advantage is that you could create validators that are shared and so reusable inside other validators, like for example an EmailAddressValidator.

My validation framework of choice is FluentValidation. It offers everything we need, so download it, and put the FluentValidation.dll assembly in \Trunk\Libraries\FluentValidation.

Add domain entity validation

In the ArchitectureExample.Domain.Entities project, add the following validators:

public class BlogEntryValidator : AbstractValidator<BlogEntry>
{
public BlogEntryValidator()
{
RuleFor(blogEntry => blogEntry.Title).NotNull().NotEmpty().Length(1, 200);
RuleFor(blogEntry => blogEntry.Body).NotNull().NotEmpty().Length(1, 2000);
RuleFor(blogEntry => blogEntry.LastModifiedDate).NotEqual(DateTime.MinValue);
}
}

public class CommentValidator : AbstractValidator<Comment>
{
public CommentValidator()
{
RuleFor(comment => comment.Name).NotNull().NotEmpty().Length(1, 50);
RuleFor(comment => comment.EmailAddress).NotNull().NotEmpty().Length(1, 200).EmailAddress();
RuleFor(comment => comment.CommentText).NotNull().NotEmpty().Length(1, 1000);
RuleFor(comment => comment.DateWritten).NotEqual(DateTime.MinValue);
}
}

And in the entities, call validation in the constructor:

public class BlogEntry
{
private BlogEntry() {}
public BlogEntry(string title, string body)
{
LastModifiedDate = DateTime.Now;
Title = title;
Body = body;

var blogEntryValidator = new BlogEntryValidator();
blogEntryValidator.ValidateAndThrow(this);
}
}

public class Comment
{
private Comment() {}
public Comment(BlogEntry blogEntry, string name, string emailAddress, string commentText)
{
BlogEntry = blogEntry;
Name = name;
EmailAddress = emailAddress;
CommentText = commentText;
DateWritten = DateTime.Now;

var commentValidator = new CommentValidator();
commentValidator.ValidateAndThrow(this);
}
}

Now, because the ValidateAndThrow operation throws a ValidationException, we have to extend the GlobalExceptionHandler so that it can handle it too and convert it to a business fault:

public class GlobalExceptionHandler : IErrorHandler
{
public void ProvideFault(Exception error, MessageVersion version, ref Message fault)
{
if (error is BusinessException)
{
var businessError = error as BusinessException;
var businessFault = new BusinessFault(businessError.ErrorCode, businessError.Message);
var faultException = new FaultException<BusinessFault>(businessFault, "Oops", new FaultCode("BusinessFault"));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, error.TargetSite.Name);
}
else if (error is ValidationException)
{
var validationError = error as ValidationException;
var businessFault = new BusinessFault("VALIDATIONERROR", validationError.Message);
var faultException = new FaultException<BusinessFault>(businessFault, "Oops", new FaultCode("ValidationFault"));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, error.TargetSite.Name);
}
else
{
var faultException = new FaultException(string.Format("{0}", error.TargetSite.Name));
var msgFault = faultException.CreateMessageFault();
fault = Message.CreateMessage(version, msgFault, faultException.Action);
}
}

public bool HandleError(Exception error)
{
return true;
}
}

Of course, this is just an example; you could create a separate ValidationFault that contains a list of errors, and throw that one in the case of a validation error; but you get the idea on how to catch exceptions in the global exception handler and throw faults based on them.

To test this:

[TestMethod]
public void AddBlogEntryWithoutTitleTest()
{
string errorCode = null, faultMessage = null;
using (var architectureExampleServiceAgent = new ArchitectureExampleServiceAgent("BasicHttpBinding_IArchitectureExampleService"))
{
try
{
var response = architectureExampleServiceAgent.AddBlogEntry(new AddBlogEntryRequest()
{
Title = null,
Body = "Blog entry body"
});
}
catch (FaultException<BusinessFault> fex)
{
errorCode = fex.Detail.ErrorCode;
faultMessage = fex.Detail.Message;
}
}
Assert.IsNotNull(errorCode);
Assert.IsNotNull(faultMessage);
}

Add request validation

Adding validation to the request handlers is about the same: you just add request validators and call them in the beginning of the execute operation of the request handlers.

Next time I’ll add logging!

Follow

Get every new post delivered to your Inbox.