Getting Started with the RavenDB Index Replication Bundle

Posted on August 28, 2012 by efarr

Being able to push Raven data out to a SQL-based data repository for reporting will be a huge boost to RavenDB adoption. The Index Replication Bundle does a great job of solving that problem, but I found the documentation to be a little weak.

I created a minimal program that implemented it for myself and decided to post it to GitHub so someone else looking to experiment with the Index Replication Bundle can get started more smoothly. This Stack Overflow question was helpful along the way.

This sample has a simple Raven store with a collection of trivial documents. Those documents are directly reflected out to SQL Server for reporting purposes. Once you have all of the necessary plumbing worked out, making the mapping between Raven and SQL Server more sophisticated is just a matter of making the indices more complicated.

Install The Plugin

Create a “Plugins” directory under the directory where Raven.Server.exe resides. Copy Raven.Bundles.IndexReplication.dll there…

Configure the Connection String

Add a connection string to Raven.Server.exe.config…

<connectionStrings>
     <add
       name="Reports"
       providerName="System.Data.SqlClient"
       connectionString="Data Source=(local);Initial Catalog=Sample;Integrated Security=SSPI;"/>
</connectionStrings>

Create a database in SQL Server called “Sample”. The rest of the steps are implemented in the sample program.

Create Table

A bit of code to create the table in SQL Server…

using (SqlConnection connection = new SqlConnection(_sqlConnectionStringTextBox.Text))
{
    connection.Open();
    SqlCommand createTable = new SqlCommand(@" CREATE TABLE [dbo].[Dogs]( [Id] nvarchar(64) NULL, [Name] nchar(255) NULL, [Breed] nchar(255) NULL) ON [PRIMARY]", connection);
    createTable.ExecuteNonQuery();
    connection.Close();
}

Create Index

Next, we create a Raven index…

public class Dogs_List : AbstractIndexCreationTask<Dog>
{
    public Dogs_List()
    {
        Map = dogs => from d in dogs select new { Breed = d.Breed, Name = d.Name };
    }
}

Configure Replication

Applying the replication configuration described in the docs was not clear to me. It is simple once you see it, but not necessarily obvious…

var replicationDocument = new Raven.Bundles.IndexReplication.Data.IndexReplicationDestination
    {
        Id = "Raven/IndexReplication/Dogs/List",
        ColumnsMapping =
            {
                { "Name", "Name" },
                { "Breed", "Breed" }
            },
        ConnectionStringName = "Reports",
        PrimaryKeyColumnName = "Id",
        TableName = "Dogs"
    };

using (var store = new DocumentStore { Url = _ravenUrlTextBox.Text })
{
    store.Initialize();
    store.DatabaseCommands.EnsureDatabaseExists(RavenDatabaseName);
    using (var session = store.OpenSession(RavenDatabaseName))
    {
        session.Store(replicationDocument);
        session.SaveChanges();
    }
}

Add Items to Raven

Now, once you add items to Raven, they should appear in the Dogs table in SQL Server.

If you don’t see them, try turning debug logging on.

Help Turn the “Jersey Shore” into “Silicone Shore”

Posted on May 4, 2012 by efarr

Want to work with the latest technology on a self-directed team? Think you need to be in Silicone Valley to be part of a company setting the standard in cloud and mobile computing? Read on.

Marathon Data Systems, on the beautiful Jersey Shore, through our various vertical-focused brands, serves the people who provide the services that we all count on: HVAC, plumbing, lawn care, pest control, maid services, carpet cleaning, and more. We are in the process of building a mobility solution that will set the industry standard for field service worker and salesperson productivity.

Cross-Platform Mobile

Our new cross-platform mobile client is HTML5-powered and will initially run on iOS and Android devices. The technician or salesperson is continually operating on the latest information (no more periodic synchronization) and yet can be fully functional while out of network coverage.

The new mobile client is built in CoffeeScript on the Sencha Touch 2 platform. The combination of Sencha Touch and CoffeeScript making building off-line capable, singe-page applications actually pretty fun. Sencha takes care of the tedium of rendering just the right HTML to the screen for each device, while we focus on building great user experiences.

Since WebSockets are not yet a reality (and pull-based applications are lame), we’re using SignalR to simulate server push from ASP.NET MVC (think Node.js scalability in ASP.NET MVC).

Message-Driven CQRS Back-End

On the back-end we are are building out a scalable, event-driven CQRS system built around NServiceBus, following the principles of Domain-Driven Design (DDD).

In Domain-Driven Design, we recognize that the heart of the software is the domain-specific behavior that allows its users to solve their problems. We establish a vocabulary that spans across technical and nontechnical people. We iteratively tweak and adjust our domain model to better map to the problem space. We focus on building an effective model of our domain, while minimizing entanglement with infrastructure concerns.

NServiceBus (NSB) allows us to focus on the semantic meaning of our commands and events and what the system’s behavior should be. NSB handles interaction with queues, message routing, retries, and other plumbing-level concerns.

Data persistence is achieved through the joy that is RavenDB. If you think ORMs like NHibernate are good, just wait until you have built a C# application with RavenDB.

RavenDB is a high-performance, second generation document database, native to C#. Queries are made through LINQ. Joins and transforms are handled through map/reduce functions also written in LINQ. The impedance mismatch between object-oriented code and a relational database is not merely bridged, but is eliminated.

Agile Team

Agile can mean almost anything you want it to mean, but at Marathon, it means…

A belief in the core values of the agile manifesto:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

The Scrum process, where a self-organizing team works together to plan and deliver iteratively and incrementally.

The Extreme Programming (XP) practices of test-driven development, pair-programming, continuous integration, collective code ownership.

A commitment to software craftsmanship. Being a professional means a commitment to quality work and continual improvement. Great software comes less from particular technologies that come and go, but from fundamentals like the SOLID principles and object-oriented design principles and patterns that stand the test of time.

This job may be for you if…

You are good, but you expect to keep getting better.
You can do it yourself, but you’d much rather work together with a team of dedicated developers.
You love cool technologies, but you care about user loving and using your software more.
You love coding in C#.
If you are not currently test-driven, you want to learn to be.
If you are not currently pair-programming, you are willing to give it a try.

If that sounds like you, please send your resume to jobs@marathondata.com. Come join us as we create great software to serve the people who serve the world!

Review of Advanced Distributed Systems Design using SOA & DDD on Video

Posted on April 13, 2012 by efarr

As my team was getting ready to embark on a significant new project built around asynchronous messaging and NServiceBus, I would have really liked to send the entire team to Udi Dahan‘s five-day Advanced Distributed Systems Design using SOA & DDD course. Distributed, service-oriented systems have many advantages over traditional centralized solutions. However, the change in thinking that the development team must go through is daunting. However, sending the entire team offer a week would’ve been a challenge. As an alternative, I purchased the course videos and we work through them together over the course of three weeks. This is my review of that course (on video).

Why should you even care?

Before getting into the review, why would you want to build a distributed, service-oriented system? In a word, scale.

Scaling the software: Asynchronous messaging and eventual consistency allows the system to scale out to redundant, cheap hardware. Brewer’s CAP theorem shows, centralized systems that rely on two-phase commits for all updates sacrifice reliability/availability in favor of an immediate consistency that isn’t always necessary.

Scaling the development team: A well-factored service-oriented system avoids the problem of a single monolithic system that becomes increasingly difficult to enhance and maintain and requires scale up of a single development team and resists scale out to multiple teams.

Review

The production quality leaves a lot to be desired. There is an annoying amount of background noise through all of the videos. The video is shot with a single camera that pans inconsistently between Udi, the whiteboard, and the projector screen. Honestly, you have to really want the content to make it through all 40 hours. However, you will be well rewarded.

The best way to describe the course is to think of it as a five day walk through all of the factors that led Udi to develop NServiceBus. I’ve followed Udi since way before NServiceBus. I listened to the Ask Udi podcast and attempted to grasp the essence of SOA. No matter how hard I tried, I was never able to implement the tenants of SOA with the traditional Microsoft tools and WCF. It wasn’t until I built a system with NServiceBus that I started to make the mental transition. Every time something I was trying to do was difficult and I was feeling that NServiceBus was overly constraining, I eventually realized it’s not you, it’s me. When you are doing it right, NServiceBus feels natural. When you violate good design for distributed, service-oriented systems, NServiceBus nudges you back by becoming difficult, causing you to go looking for a work-around, and eventually finding a post by Udi that gently explains why what you are trying to do would eventually bite you in the rear end and what you should be doing instead. This course lays the groundwork that will help you avoid the wrong turns and false starts you are bound to make otherwise if you are moving from the familiar world of centralized, monolithic systems.

He spends a significant portion of the course methodically and deliberately dislodging ideas that most of us have held for so long that we no longer question. He also works to temper and ground some of the techniques that have caught on with a small but vocal minority of.net developers, such as CQRS and event sourcing.

I recommend this course if any of these apply to you:

I don’t know how to model autonomous services. Everything is related to everything else. My domain is too hard to tease apart into independent components. (Hint: this is hard at first, but Udi will take you through enough examples to get you headed in the right direction. Oh… and your domain is not too hard to do this.)
I love Domain-Driven Design, but I don’t know how to model the domain and application layers. I recognize that I have fallen into the anemic domain model and all of my logic is in procedural application layer functions. (Hint: There is a place for simple CRUD and the rest cannot be modeled without sagas.)
I like the idea of CQRS, but I don’t know how to approach it, or I tried and it didn’t work out as well as I expected. (Hint: Event sourcing is not the silver bullet of CQRS.)
I am using NServiceBus, but I don’t understand sagas; so, I’m not using them. (Hint: you’re doing it wrong.)

All of these were true for me at some level. Getting these issues addressed made the course extremely valuable to me.

Conclusion

The entire team feels the course was a great investment and is entering this new project with anticipation and confidence that we will avoid many mistakes that we would have otherwise made if we had not taken the virtual course. I believe we will save the money we spent on the course and the time away from coding many times over on this project and beyond. If you are building a moderately complex collaborative system, you owe it to yourself to check out this course. If you can attend a live offering, I recommend that. But if you cannot, then the videos are well worth the investment as the next best thing.

Lunch with Uncle Bob

Posted on January 8, 2012 by efarr

Ever since I stumbled across the original C++ Report articles that have become known as the SOLID principles, I have been a disciple of Robert Martin (aka Uncle Bob). He is a leader within the agile and software craftsmanship movements. He has as good a sense of what makes good software as anyone currently writing and teaching. If he thinks it’s worth writing, then it’s worth reading.

I’m not usually a fan of video for learning. I like the random access referenceability of books and I like the on-the-go accessibility of audio recordings. I find video to be the worst of both worlds: I cannot flip to the a particular page or go at my own pace, nor can I consume it while driving or mowing the lawn. Further, there are so many good conference videos available for free that I find it difficult to justify paying for video content.

However, when Uncle Bob began releasing the Clean Code video series, I thought I’d at least check them out. I found them to be so good, that I’m now having my development team watch them together over lunch hours. Uncle Bob does a great job teaching (and preaching) the techniques that lead to clean, maintainable software. As we watch the sessions together, we are creating a common baseline of understanding that we can all refer to as we work together.

I highly recommend them for any software development team that is looking to get better (and if your team is not working to get better, it is on it’s way to obsolescence). If you have trouble justifying the cost ($12 per viewer per video) to management, have them take a look at this excellent explanation of the value of software craftsmanship and professionalism.

Outstanding Summary of Domain Driven Design

Posted on November 30, 2011 by efarr

I just came across the best magazine-article-length summary of DDD that I’ve ever seen. If you’ve heard of DDD, but not ready to commit to reading the whole Blue Book, check out Dan Haywood’s An Introduction to Domain Driven Design. It is clear, concise, and remarkably comprehensive for all of it’s brevity. It also makes for a great refresher.

DDD Anti-Pattern #4: Allowing Implementation Decisions to Drive the Domain Model

Posted on September 24, 2011 by efarr

This is the last (for now) in my series of lessons learned building a complex product from the ground up following the principles of Domain Driven Design.

The Field Service domain is all about getting people to Locations to perform Activities. The Activities and the Location are defined by a Work Order. The person performing the Activities (and possibly the time) are defined by an Appointment. You can think of the Work Order as the what and where and the Appointment as the who and when.

The model is simple as long as there is one Work Order and one Appointment. In fact, you’d be tempted to combine them into one entity. However, things get more complicated when you have one Work Order that requires multiple Appointments to complete it. It might be two technicians at one time or the same technician on two different visits or multiple technicians over multiple visits.

In these cases, we are splitting the Activities of the Work Order over multiple Appointments. Some of the Activities are associated with one Appointment and some with another.

The Work Order is complete when all Activities over all Appointments are complete. Not too complicated.

Now also imagine that a technician goes out on one Appointment but services more than Work Order.

In simplest terms, we have a many-to-many between Appointments and Work Orders. You can imagine the twisted case where a given Appointment services two different Work Orders, and each of those Work Orders has other Appointments serving them.

We’ve all implemented many-to-many relationships in databases and in object models. However, this situation is a little different because of the role of the Activity. The linkage between the Appointment and the Work Order is through their association with a common Activity.

A given Activity is associated with one Work Order and one Appointment. We can now traverse that relationship to discover the relationship between Appointment and Work Order.

We simply have Work Orders with collections of Activities and Appointments with collections of Activities. With that, the complex relationship between Appointment and Work Order is completely modeled, with no redundant connections (like something directly linking Appointments and Work Orders). I thought this was pretty cool. And it was.

But here is where the problem comes in… In the real world, Appointments really are associated with Work Orders—and not just by the happenstance of common Activities. (One could argue this actually is true because you would only schedule an Appointment to service a Work Order if there was some Activity to perform, but this is not how anyone thinks about the relationship.)

With our model, to answer the question “Where does this Appointment take place?” we have to go to one of the Activities in our collection, navigate to its Work Order, and look at its Location property. This not a big deal to do in code, and we can even put a property on the Appointment that hides this messiness and gives the illusion that an Appointment has a Location. But this is not the only difficulty.

Try looking in our database to see what Work Order a given Appointment is associated with. You have to go to the Activity table and find all the Activities associated with this Work Order, then look at the Work Order column of those Activities to find the ID(s) of the Work Order(s). One day I tried to create a view that shows an Appointment and its first Work Order (there was only ever one Work Oder per Appointment in practice). I gave up after 20 minutes.

Now imagine explaining this model to a programmer trying to integrate with our system. Its possible, but there is much more confusion than you’d like.

We had a technically tight and elegant implementation, but it ended up obscuring one of the most fundamental relationships in our domain.

Remedy

As much as we liked the normalization of our current implementation, the team decided that it would be better to model the relationship between Appointment and Work Order explicitly. If we had focused more on the domain and less on the cleverness of the implementation, we could have avoided this rework of the design.

11 Reasons You Want Mobility Experience Before Building a Mobile HTML5 Application

Posted on September 22, 2011 by efarr

Two forces have converged: 1) Mobility has gone from an optional differentiator to an expected component of any software offering, and 2) HTML5 has been crowned as the solution that will solve the cross-platform problem that Java, Flash, and Silverlight failed to solve before it.

This collision of forces has turned HTML5 into a buzzword with a life of its own. In fact, it appears to be on its way to becoming as detached from reality as the all-time-champion of promising technology turned meaningless buzzword: SOA.

Don’t get me wrong, I believe HTML5 is the best current answer to cross-platform mobile software. Before recommending to my executive management that we build our cross-platform offering in JavaScript, using the family of features loosely known as HTML5, I looked at pure native, cross-platform native with Mono, frameworks like Titanium. I made that recommendation before HTML5 became the cool thing to do, and I haven’t regretted it for a second.

A sure sign that a technology has reached fad level is when articles start to appear pointing out that said new technology will not, in fact, usher in world peace. Such an article was the popular 11 hard truths about HTML5.

The title was a little ominous and I began to read it with some trepidation, as we were still a couple of months away from being ready to ship our HTML5 client application. However, as I read, I was comforted by the fact that although all 11 truths were valid challenges, our team had faced and dealt with each of them.

The article lays out eleven challenges when building an HTML5 application:

Security is a nightmare.
Local data storage is limited
Local data can be manipulated.
Offline apps are a nightmare to sync.
The cloud owes you nothing.
Forced upgrades aren’t for everyone.
Web Workers offer no prioritization.
Format incompatibilities abound.
Implementations are browser-dependent.
Hardware idiosyncrasies bring new challenges.
Politics as usual.

It’s a scary list, and they are all true. My team was able to handle and mitigate each of these challenges largely because we had years of experience building native mobile applications and desktop Web applications and much of what we learned there applied in the HTML5 world.

If you don’t have solid answers for each of these challenges, you really ought to get someone on the team who has confidence in dealing with each of them.

You can check my current availability here.

DDD Anti-Pattern #3: Not Taking Bounded Contexts Seriously

Posted on August 31, 2011 by efarr

My team had the challenge of building a field service automation application that would run on Windows Mobile devices. Availability of the Compact Framework meant that most code that would run on the server would also run on the device. This meant that we could build one domain object model and deploy it on the server and on the client. But in Jurassic Park fashion, we were so preoccupied with the fact that we could, we didn’t spend enough time asking if we should.

The Advantage We Got

We used the Repository pattern and built our domain model out of plain old CLR objects (POCOs). This meant that we could have the same rich domain model both on the client where the user is interacting through a UI and on the sever where changes from the field and from host system integrations are being processed. This allowed us to keep our code DRY. There was no repetition among the entities, behaviors, relationships between entities, and tests. We never had to worry about model inconsistencies between the client and the server. It seemed like a big win, and in some ways it was.

The Price We Paid

As we built out the product, we saw that the object model’s usage patterns were not the same on the client as on the server. We ended up with behavior that only applied on the client or on the server.

In an effort to keep from putting code into our domain that would not be relevant on both the client and server, we started putting logic into little services that work with the domain. Most of this logic was on the client side and got applied through use of the Event Aggregator pattern. This worked, but tended to impoverish the domain model itself.

Fixing It

The problem largely fixed itself when Windows Mobile became irrelevant almost overnight. We rebuilt the client in JavaScript that runs on most modern browsers. This gave us a do-over opportunity and removed the option of sharing model code between the server and client.

If I had the Windows Mobile situation to do over again, I would be more careful to define a common structure that the client and server could share, but compose behaviors relevant to the client and server within each of those contexts.

DDD Anti-Pattern #2: Not Getting the Whole Team Educated on DDD Early Enough

Posted on August 23, 2011 by efarr

The two challenges that drew me to Agentek in late 2008 were interrelated in the same way that the proverbial chicken and egg are. We had to build a complex, composite, occasionally connected, enterprise mobility application to replace the prior practice of custom, one-off solutions. At the same time, we had to bring the existing development group up to speed on current software techniques, practices, patterns, and processes.

I couldn’t hold off on designing and building the new product until everyone had gotten the DDD religion. In reality, I don’t think I would have done that even if I had the luxury of that kind of time. Concepts like DDD are best learned by doing them with someone who has done it before. Reading books and hearing presentations can get you excited about them, but only doing them helps you actually learn them.

As I prepared for the first meetings with the domain experts (folks from across the organization who had implemented multiple custom solutions and had a good sense of the domain of our new product), I sent out a copy of Domain Driven Design Quickly. I wanted to give them as much of an idea of what I was after as possible up front, but it mostly came down to me explaining it as we were doing it.

We had some great sessions. The white board photos we made in those early days formed a remarkably useful and resilient core of the domain model that is still reflected in the product today. We hashed out much of the [soon to be] ubiquitous language, and we all had a vision of what the new product would be.

Shortly after this, we hired a big DDD advocate (I’m still amazed at our good fortune in finding him. They are rare now, even more so in 2009). With Jarrel and me on the development team, we spread the DDD mindset organically as we built out the product. This seemed like a good and pragmatic approach.

In late 2010/early 2011, the entire development organization went through the blue book, chapter by chapter, discussing the pros and cons of Evans’ approach, where we’ve been effective, and where we haven’t.

What I learned was loud and clear: We would have had a better product and stayed out of the technological weeds much better if we had formally gone through the DDD material in 2009 instead of 2011. I underestimated how much more effective the team would have been if we took the time to build a common foundation of the DDD concepts. It didn’t mean we had to hold anything up. I just should have made it a priority earlier. I wont make that mistake again.

DDD Anti-Pattern #1: Not Accounting for Commands and Queries as Separate Concerns

Posted on August 12, 2011 by efarr

As a long time object-oriented programmer, I was immediately drawn to the DDD concept of Persistence Ignorance. My initial idea of DDD was to simply express the domain concepts in a C# object model, then one one side, map those objects to a persistent store with a technology like NHibernate and on the other side, build a UI without any business logic in it.

Figure 1: The Naïve Transition to DDD

This was a big improvement over the traditional techniques on the Microsoft stack. Databases are inadequate for expressing domain models and business logic in UIs or databases makes unit testing nearly impossible.

How the Naïve DDD Approach Fails

What I didn’t see was that this naïve solution traded one set of problems for another. The new solution looks like Figure 2 below.

Figure 2: The Naïve DDD Approach

This looks OK until you try to build anything remotely complex with this model. The first bit of friction we ran into involved populating some of the screens on the UI. Many screens don’t map smoothly to our rich object model. We get by with technologies like LINQ and lazy loading, but it still feels awkward to spin up a rich object model to simply copy properties to a view.

Things really turn south when we start to have multiple interfaces to the application. Imagine we have our UI and add a SOAP-based API to allow other programs to drive the application.

Figure 3: The Naïve DDD Approach with Multiple Applications

Now the CRUD layer really starts causing trouble. First, notice that because our domain model is defined in an assembly (or package or library), it gets deployed to each separate application that needs it. We have a layered architecture, however, all but the database layer get replicated to multiple applications.

CRUD-Based Models and Meaning

Imagine that a user makes a update to the comments on order 0123 through the UI at the same time that an external program calls GetWorkOrder(“0123”), updates the order description, then passes the updated order into SaveWorkOrder. Now both components are instantiating order 0123, copying the updated property values and saving it back to the database.

We lose the intent of each of those two actors. We are left to sort out any conflicts on simply timing and the intelligence of our ORM tool.

You could avoid the replication in Figure 3 by creating a application server, but you would still have the loss of semantically meaningful information that comes with a CRUD-based object model.

Request/Response: Good for Some Things Not for Others

Our Figure 3 architecture is inherently synchronous request/response. Consider the GetWorkOrder Web service method. Our external system calls that method and gets back the specified work order. This is what we want. The calling code cannot continue until the request comes back.

On the other hand, imagine any method that makes any kind of change to the system (like SaveWorkOrder). We would like to be able to queue these changes and allow the system to process them as compute capacity is available. We also don’t want to have to write code to handle the case where the Web service is temporarily not available.

The Old Way Had Some Things Right

Notice that the bad old traditional system does not exhibit some of these problems:

Figure 4: The Old Smart UI Approach

What the traditional system did right was account for updates being different from queries. Reads used SQL queries or stored procedures and views to get exactly what the screen needs—no more, no less. The write side used SQL inserts and updates to make just the changes necessary to accommodate the user’s actions in the UI. You couldn’t test it and you wouldn’t want to be the one to have to maintain it or enhance it, but it performed well and got the job done.

It did, however, have all the same request/response limitations.

Getting it Right with DDD

Fixing this problem meant an overhaul of the architecture, but the result was well worth the effort. In a nutshell, we redesigned the system to follow the Command Query Responsibility Segregation (CQRS) pattern.

Now all changes to the system are expressed as commands. Anytime the system changes, one or more events are published. Everything is commands in and events out. For reads, we added an Open Data Protocol feed that allows any application to query the data.

Figure 5: CQRS Design

Each of the applications and integration points is now sending commands to a central processing module (which may be dispatching to several machines). The commands express the precise intent of the sender of the command. So, UpdateWorkOrder becomes SetWorkOrderComment or SetWorkOrderDescription. Sending the command allows semantic meaning of the action to follow all the way through. We now log both the commands and the resulting events, creating a detailed audit trail of what happened within our system. This also allows us to better avoid merge conflicts because changes are more targeted with the specific commands.

Asynchronous Messaging versus Request-Response

Another huge win for this design improvement is in how it can scale. Building out the “write” side of the system with NServiceBus means that we can ditch all of the home-grown queuing, retry, and pub/sub logic between components. Now, all of our writes are async, forcing us to learn to deal without getting return codes or statuses from our calls. It took some getting used to, but is liberating in the end.

Our load testing shows that the system is now resilient to peak loads. The message queues swell temporarily, but everything continues to function.

The Farr Side

Eric Farr – Technical Leadership in SaaS, Mobile, and Predictive Analytics

Category Archives: Architecture