Categories: MongoDB, NoSQL, Performance Posted by nurih on 5/19/2014 4:00 PM | Comments (0)

What's the first thing you hear about NoSQL databases? That they lose your data? That there's no transactions? No joins? No hope for "real" applications?

Well, you *should* be wondering whether a certain of database is the right one for your job. But if you do so, you should be wondering that about "traditional" databases as well!

In the spirit of exploration let's take a look at a common challenge:

You are a bank.

You have customers with accounts.

Customer A wants to pay B.

You want to allow that only if A can cover the amount being transferred.

Let's looks at the problem without any context of any database engine in mind. What would you do? How would you ensure that the amount transfer is done "properly"? Would you prevent a "transaction" from taking place unless A can cover the amount?

There are several options:

  1. Prevent any change to A's account while the transfer is taking place. That boils down to locking.
  2. Apply the change, and allow A's balance to go below zero. Charge person A some interest on the negative balance. Not friendly, but certainly a choice.
  3. Don't do either.

Options 1 and 2 are difficult to attain in the NoSQL world. Mongo won't save you headaches here either.

Option 3 looks a bit harsh. But here's where this can go: ledger. See, and account doesn't need to be represented by a single row in a table of all accounts with only the current balance on it. More often than not, accounting systems use ledgers. And entries in ledgers - as it turns out – don't actually get updated. Once a ledger entry is written, it is not removed or altered. A transaction is represented by an entry in the ledger stating and amount withdrawn from A's account and an entry in the ledger stating an addition of said amount to B's account. For sake of space-saving, that entry in the ledger can happen using one entry. Think {Timestamp, FromAccountId, ToAccountId, Amount}.

The implication of the original question – "how do you enforce non-negative balance rule" then boils down to:

  1. Insert entry in ledger
  2. Run validation of recent entries
  3. Insert reverse entry to roll back transaction if validation failed.

What is validation? Sum up the transactions that A's account has (all deposits and debits), and ensure the balance is positive. For sake of efficiency, one can roll up transactions and "close the book" on transactions with a pseudo entry stating balance as of midnight or something. This lets you avoid doing math on the fly on too many transactions. You simply run from the latest "approved balance" marker to date. But that's an optimization, and premature optimizations are the root of (some? most?) evil..

Back to some nagging questions though: "But mongo is only eventually consistent!" Well, yes, kind of. It's not actually true that Mongo has not transactions. It would be more descriptive to say that Mongo's transaction scope is a single document in a single collection.

A write to a Mongo document happens completely or not at all. So although it is true that you can't update more than one documents "at the same time" under a "transaction" umbrella as an atomic update, it is NOT true that there' is no isolation. So a competition between two concurrent updates is completely coherent and the writes will be serialized. They will not scribble on the same document at the same time. In our case - in choosing a ledger approach - we're not even trying to "update" a document, we're simply adding a document to a collection. So there goes the "no transaction" issue.

Now let's turn our attention to consistency. What you should know about mongo is that at any given moment, only on member of a replica set is writable. This means that the writable instance in a set of replicated instances always has "the truth". There could be a replication lag such that a reader going to one of the replicas still sees "old" state of a collection or document. But in our ledger case, things fall nicely into place: Run your validation against the writable instance. It is guaranteed to have a ledger either with (after) or without (before) the ledger entry got written. No funky states. Again, the ledger writing *adds* a document, so there's no inconsistent document state to be had either way.

Next, we might worry about data loss. Here, mongo offers several write-concerns. Write-concern in Mongo is a mode that marshals how uptight you want the db engine to be about actually persisting a document write to disk before it reports to the application that it is "done". The most volatile, is to say you don't care. In that case, mongo would just accept your write command and say back "thanks" with no guarantee of persistence. If the server loses power at the wrong moment, it may have said "ok" but actually no written the data to disk. That's kind of bad. Don't do that with data you care about. It may be good for votes on a pole regarding how cute a furry animal is, but not so good for business.

There are several other write-concerns varying from flushing the write to the disk of the writable instance, flushing to disk on several members of the replica set, a majority of the replica set or all of the members of a replica set. The former choice is the quickest, as no network coordination is required besides the main writable instance. The others impose extra network and time cost. Depending on your tolerance for latency and read-lag, you will face a choice of what works for you.

It's really important to understand that no data loss occurs once a document is flushed to an instance. The record is on disk at that point. From that point on, backup strategies and disaster recovery are your worry, not loss of power to the writable machine. This scenario is not different from a relational database at that point.

Where does this leave us? Oh, yes. Eventual consistency. By now, we ensured that the "source of truth" instance has the correct data, persisted and coherent. But because of lag, the app may have gone to the writable instance, performed the update and then gone to a replica and looked at the ledger there before the transaction replicated. Here are 2 options to deal with this.

Similar to write concerns, mongo support read preferences. An app may choose to read only from the writable instance. This is not an awesome choice to make for every ready, because it just burdens the one instance, and doesn't make use of the other read-only servers. But this choice can be made on a query by query basis. So for the app that our person A is using, we can have person A issue the transfer command to B, and then if that same app is going to immediately as "are we there yet?" we'll query that same writable instance. But B and anyone else in the world can just chill and read from the read-only instance. They have no basis to expect that the ledger has just been written to. So as far as they know, the transaction hasn't happened until they see it appear later. We can further relax the demand by creating application UI that reacts to a write command with "thank you, we will post it shortly" instead of "thank you, we just did everything and here's the new balance". This is a very powerful thing. UI design for highly scalable systems can't insist that the all databases be locked just to paint an "all done" on screen. People understand. They were trained by many online businesses already that your placing of an order does not mean that your product is already outside your door waiting (yes, I know, large retailers are working on it... but were' not there yet).

The second thing we can do, is add some artificial delay to a transaction's visibility on the ledger. The way that works is simply adding some logic such that the query against the ledger never nets a transaction for customers newer than say 15 minutes and who's validation flag is not set.

This buys us time 2 ways:

  1. Replication can catch up to all instances by then, and validation rules can run and determine if this transaction should be "negated" with a compensating transaction.
  2. In case we do need to "roll back" the transaction, the backend system can place the timestamp of the compensating transaction at the exact same time or 1ms after the original one. Effectively, once A or B visits their ledger, both transactions would be visible and the overall balance "as of now" would reflect no change.  The 2 transactions (attempted/ reverted) would be visible , since we do actually account for the attempt.

Hold on a second. There's a hole in the story: what if several transfers from A to some accounts are registered, and 2 independent validators attempt to compute the balance concurrently? Is there a chance that both would conclude non-sufficient-funds even though rolling back transaction 100 would free up enough for transaction 117 (some random later transaction)? Yes. there is that chance. But the integrity of the business rule is not compromised, since the prime rule is don't dispense money you don't have. To minimize or eliminate this scenario, we can also assign a single validation process per origin account. This may seem non-scalable, but it can easily be done as a "sharded" distributrion. Say we have 11 validation threads (or processing nodes etc.). We divide the account number space such that each validator is exclusively responsible for a certain range of account numbers. Sounds cunningly similar to Mongo's sharding strategy, doesn't it? Each validator then works in isolation. More capacity needed? Chop the account space into more chunks.

So where  are we now with the nagging questions?

  • "No joins": Huh? What are those for?
  • "No transactions": You mean no cross-collection and no cross-document transactions? Granted - but don't always need them either.
  • "No hope for real applications": well...

There are more issues and edge cases to slog through, I'm sure. But hopefully this gives you some ideas of how to solve common problems without distributed locking and relational databases. But then again, you can choose relational databases if they suit your problem.

Categories: MVC, Web, Code Development, General Posted by nurih on 4/11/2013 4:57 AM | Comments (0)

ASP.NET MVC comes with nice features to aid model validation. Unfortunately, you are still stuck writing boilerplate code on all the data entry actions. The boilerplate code looks something like:

public ActionResult DoSomething(Foo value)
{
if (ModelState.IsValid)
{
return View();
}

...// do actual work
return View("AllGoodThanks");
}

 

The common desired behavior is that when the submitted model is invalid the view is immediately returned so the user can fix erroneous entries. But since the flow is such that a value needs to be returned, you can't just refactor this into a common method.

What to do? Lets implement DRY (don't repeat yourself. Duh! just did..) based on ActionFilterAttribute:

public class ValidateModelAttribute : ActionFilterAttribute
{
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
if (filterContext.Controller.ViewData.ModelState.IsValid)
{
return;
}

filterContext.Result = new ViewResult
{
ViewName = filterContext.ActionDescriptor.ActionName,
ViewData = filterContext.Controller.ViewData,
TempData = filterContext.Controller.TempData
};
}
}

 

This custom attribute uses the same mechanism the Controller would have used and relies on model attributes to signal data fitness.

A straightforward behavior returning the user to the same form (view) is sufficient in most cases:

[ValidateModel]
public ActionResult DoSomething(Foo value)
{
...// do actual work
return View("AllGoodThanks");
}

 

The total lines of code saved grow as you add many more actions (as my projects tend to gain momentum), and is quite significant.

Categories: MongoDB, NoSQL, General Posted by nurih on 2/19/2013 7:03 AM | Comments (0)

MongoDB's ObjectId() has some nice sequential properties. One of the interesting ones is the fact that the most significant 4 bytes are a timestamp with seconds granularity.

Suppose you want to query your collection for items created on or after a certain date. Since the timestamp portion can be constructed (milliseconds since epoch), and the rest can be manufactured (zero would be fine) we can now write a function to generate what the ObjectId would be or be just higher or lower than:

 

var past =  new Date((new Date()).getTime() - (90 * 24 * 60 * 60 * 1000));
var stamp = ObjectId(Math.floor(past.getTime() / 1000).toString(16) + "0000000000000000");

The stamp object contains an ObjectId with a value representing the floor value of any object ids generated 90 days ago to the seconds granularity. Using the stamp value, we can then write a query for objects created on or after that time.

While this value may not be suitable for exact reporting (as the rounding may exclude or include some values because of the lack of granularity) it is well suited to finding records inserted at or around that time, such as retiring older records etc.

Categories: MongoDB, NoSQL, Performance Posted by nurih on 2/15/2013 4:38 AM | Comments (0)

MongoDB's engine can log quite a bit of useful detail. Whether high-transaction rate or verbose, the log can get quite large.

While setting the log mode to append helps you retain the old / existing log, mongo does not currently have a facility to rotate the log at prescribed times or when a size limit is reached. In other words, the log will grow indefinitely.

There are 2 ways to have the engine release current file and start a new one:

  1. SIGHUP
  2. Issue a command to mongod via a client connection

The first option, available on Unix variants, is issued like so:

killall -SIGUSR1 mongod

This would force log rotation on all instances of mongod on that machine.

The second option requires a connection to mongo. The mongo shell is capable of running in non-interactive mode in 2 ways: using eval command line expression or running a named JavaScript file. Let's pick the eval method, since we only have one command to send. Since the log rotate command needs to be issued from the admin namespace, we specify that on the connection string directly:

mongo localhost/admin -eval "db.runCommand({logRotate:1})"

The result of either of these methods is that mongod will take its current file, say /var/log/mongo/mongod.log and rename it to be suffixed with a date/time stamp such as /var/log/mongo/mongod.log.2012-01-31T12-34-56 if invoked January 31st at 12:34:56 .

The next sticking point is that you may want to compress down that file, and clean out older log files. There are some tools out there, Logrotate being one, but I decided to write a small shell script:

#!/bin/bash

### log rotate
mongo localhost/admin -eval "db.runCommand({logRotate:1})"


### compress newly rotated

for f in /var/log/mongo/mongod.log.????-??-??T??-??-??;
do
7za a "$f.z" "$f"
rm -f "$f"
done


### remove files older than x days
find /var/log/mongo/mongod.log.????-??-??T??-??-??.z -ctime +14 -delete

You might like a different compression library, 7z works for me. It produces the .z suffix by default, so the cleanup step looks for that.

Notice how the find command is issued against the file creation time, and configured to delete files older than 14 days here. Your rotation policy and deletion may require otherwise. You can run this as often as you wish to keep files small and granular, or less frequently to get logs covering extended periods. It is sometimes useful to spelunk a single larger file instead of scrolling through several hourly files to track a single event.

This solution takes care of time-triggered rotation, and is not sensing file size in any way. But it should be easy enough to modify the script to only log rotate if the current mongod.log is larger than some predefined size.

Happy admin!

Posted by nurih on 12/31/2012 10:17 AM | Comments (0)

The new Windows Azure Portal looks great, but has moved things around a bit. This post serves as note to self and others:

How do I set a custom domain name for blob/ table/ queue?

  1. Go to the new portal https://manage.windowsazure.com/
  2. Click the "Storage" item on the left (icon reminiscent of a table or spreadsheet)
  3. Click on the one of your storage items for which you want to create a custom domain
  4. Click the "configure" tab (you are in "dashboard" by default)
  5. Click the "manage domain" icon on the bottom action bar (look all the way at the bottom between "manage keys" and "delete")
  6. Enter the full domain name you want to have point to the storage "bob.mydomain.com" (assuming you own mydomain.com)
  7. Set up a CNAME record in your DNS server for the domain you own as instructed
  8. Validate the CNAME entry (may need a bit of time to propagate, so let it.

Steps 6-8 described here: http://www.windowsazure.com/en-us/develop/net/common-tasks/custom-dns/

Tags: , , , , | Categories: Code Development, Web, General, Performance, SOA Posted by nurih on 3/7/2012 9:29 AM | Comments (0)

As many of us deploy our shiny web services and expose them to the world (or just our apps), we invariably encounter these pesky maintenance windows. During these times, a database, other web services or any other IO dependent tasks cannot be performed.

Wouldn't it be nice to tell the caller of your web API that the operation is currently unavailable? It can get pretty ugly if we don't solve this. If we simply bring down the whole endpoint, connecting clients will experience a pile-up of timed out connection attempts. If we leave it up, every operation attempted would experience it's own slow excruciating failure, with the same IO timeout pile-up, this time on your server and often bringing the server to it's knees with too many doomed connection requests queued up.

 

My game plan shaped up to :

  1. Each service operation shall return a standard response, exposing some status flag
  2. A configuration controls whether services are to be marked as unavailable
  3. A WFC extension will take care of returning the standard response with proper flag when so configured, but let the regular response return under normal conditions.

The requirement that each operation returns a standard response may seem peculiar. You may have created

 
string GetUserName(string id);
 
DateTime GetUserBirthdate(string id);

The thing is, when operations fail, you have no way to signal the caller except for smelly nulls or throwing exceptions. Although Soap Fault Exception can do the trick, I find it distasteful to throw a Client Fault exception because exceptions are more costly, and validation of request data often enough finds client faults. For that and other reasons, I use code that looks like the following:

[DataContract(Namespace = "...")]
public class ServiceResponse
{
    [DataMember]
    public string Error { get; set; }
 
    [DataMember]
    public ResponseStatus Status { get; set; }
}

Where the status is an enumeration:

[DataContract(Namespace = "...")]
[Flags]
public enum ResponseStatus
{
 
    [EnumMember]
    None = 0,
    /// <summary>
    /// Operation completed without failure
    /// </summary>
    [EnumMember]
    Success = 1,
    /// <summary>
    /// General failure
    /// </summary>
    [EnumMember]
    Failure = 2,
    /// <summary>
    /// Client request not valid or not acceptable
    /// </summary>
    [EnumMember]
    ClientFault = 4,
    /// <summary>
    /// Server failed processing request
    /// </summary>
    [EnumMember]
    ServerFault = 8,
    /// <summary>
    /// The underlying service is not available, down for maintenance or otherwise marked as non-available.
    /// </summary>
    [EnumMember]
    BackendFault = 16,
 
 
    /// <summary>
    /// Convenience value for client fault failure comparison
    /// </summary>
    ClientFailure = Failure + ClientFault,
 
    /// <summary>
    /// Convenience value for server fault failure comparison
    /// </summary>
    ServerFailure = Failure + ServerFault,
 
    /// <summary>
    /// Convenience value for backend failure comparison.
    /// </summary>
    BackendFailure = Failure + BackendFault
}

One may also abstract the ServiceResponse to an interface, allowing any response object to implement the interface rather than inherit the base response. For this post, let's just go with the base class.

Now the signature of every operation would be an object derived from ServiceResponse. Rather than a fragmented GetName, GetBirthdate etc – a chatty interface anyway – we would expose:

 
public class GetUserResponse: ServiceResponse
{
    [DataMember]
    string Name{get;set;}
    
    [DataMember]
    DateTime Birthdate {get;set;}
    
    // whatever else a user profile has..
}
 
// then the operation signature becomes
[ServiceContract]
public interface IMyService
{
    [OperationContract]
    GetuserResponse GetUser(string id);
    // and other operations
}

Now that we have that out of the way, you get the payoff: we can define a fail fast attribute to decorate operations we know rely on some back-end which may be turned off on us. We'll utilize the IOperationBehavior extension point of WCF, allowing us to specify behavior on an operation by operation basis.

I've created an attribute implementing the IOperationBehavior. It replaces the operation invoker with my own implementation when ApplyDispatchBehavior is called. All other IOperationBehavior methods remain blank.

public class FailFastOperationAttribute : Attribute, IOperationBehavior
{
    public void Validate(OperationDescription operationDescription) { }
 
    public void ApplyDispatchBehavior(OperationDescription operationDescription, DispatchOperation dispatchOperation)
    {
        var returnType = operationDescription.SyncMethod.ReturnType;
        dispatchOperation.Invoker = new FailFastOperationInvoker(dispatchOperation.Invoker,returnType);
    }
 
    public void ApplyClientBehavior(OperationDescription operationDescription, ClientOperation clientOperation) { }
 
    public void AddBindingParameters(OperationDescription operationDescription, BindingParameterCollection bindingParameters) { }
}

The finishing piece is to implement the operation invoker. It will check a special configuration, and based on that would either invoke the underlying operation as the stock implementation would have, or construct a new response with the failed flags set.

public class FailFastOperationInvoker : IOperationInvoker
{
    private readonly IOperationInvoker _operationInvoker;
 
    private readonly Type _returnType;
 
    public FailFastOperationInvoker(IOperationInvoker operationInvoker, Type returnType)
    {
        _operationInvoker = operationInvoker;
        _returnType = returnType;
    }
 
    #region IOperationInvoker Members
 
    public object[] AllocateInputs()
    {
        return _operationInvoker.AllocateInputs();
    }
 
    public object Invoke(object instance, object[] inputs, out object[] outputs)
    {
        object result;
        if (Config.ShouldFailFast())
        {
            outputs = new object[0];
            // construct response of the type the specific method expects to return
            result = Activator.CreateInstance(_returnType);
            // mark the response as fail fast failure
            result = (result as ServiceResponse).Error = "Not available";
            result = (result as ServiceResponse).Status = ResponseStatus.Failure|ResponseStatus.BackendFault;
        }
        else
        {
            result = _operationInvoker.Invoke(instance, inputs, out outputs);
        }
        return result;
    }
 
    public IAsyncResult InvokeBegin(object instance, object[] inputs, AsyncCallback callback, object state)
    {
        return _operationInvoker.InvokeBegin(instance, inputs, callback, state);
    }
 
    public object InvokeEnd(object instance, out object[] outputs, IAsyncResult result)
    {
        return _operationInvoker.InvokeEnd(instance, out outputs, result);
    }
 
    public bool IsSynchronous
    {
        get { return _operationInvoker.IsSynchronous; }
    }
 
    #endregion
}

A method for determining if the API should by up or down hides behind the Config.ShouldFailFast() call. Read your app setting, check a file, do what you like to make that determination.

The next thing is manufacturing an instance of a response object. Here we need to create the same type or a type assignable to the one the formal method expected. Note that that type would need to have a parameter-less constructor for this to work. Since all my service DTO are plain POCO, this is rarely a restriction.

 

With this code in place, all we need to do is decorate specific methods as [FailFastOperation] and bingo!

Tags: , , , | Categories: Code Development, General Posted by nurih on 1/25/2012 11:50 AM | Comments (0)

As documented here on MSDN the PropertyItem object does not have a public constructor.

What to do then when you want to add a property to an image, using Image.SetPropertyItem(..) method?

This post suggests you create some bank of all property items you want, hold it memory and clone from it.

A commenter on that blog suggested using reflection: Get the non-public parameter-less constructor and invoke it. Notable downside for this approach is reliance on internal implementation of the object. True. I'll risk it though.

In my implementation, I added a helper method which simply generates the PropertyItem using System.Activator like so:

public static PropertyItem CreatePropertyItem(int id, int length, short exifType, byte[] buffer)
{
    var instance = (PropertyItem)Activator.CreateInstance(typeof(PropertyItem), true);
    instance.Id = id;
    instance.Len = length;
    instance.Type = exifType;
    instance.Value = buffer;
 
    return instance;
}

Pretty clean and simple. Under the covers, Activator will use some reflection to create the instance, but would also utilize some caching and speed written by not-me. I like not-me code because it means I don't have to write it.

Since one of my my upcoming talks at http://socalcodecamp.com is on the subject of reflection, this all falls neatly into place.

Tags: , , , | Categories: JSON, Code Development, MVC, Web Posted by nurih on 1/16/2012 5:38 PM | Comments (0)

In a previous posting, I discussed replacing the stock MVC serializer used for the JsonResult exposed when you use the controller method Json(..) instead of View.

This was all find and dandy. But how – you may wonder – can I call actions that contain complex parameters? Do I need a special binder? Should I write my own?

The answer is mostly "no". As Phil Hack blogged, the ASP.NET MVC framework already contains value providers that take care of that for you. All you need to do is ensure that your Javascript calls your action using some very specific header and mime types.

Here is a function that may help you

   1: <script type="text/javascript">
   2:        var postToAction = function (sender) {
   3:            var json = $('textarea#' + sender.data).val();
   4:            $("textarea#myResponse").val('working...');
   5:             $.ajax({
   6:                 url: document.location.href,
   7:                 type: 'POST',
   8:                 dataType: 'json',
   9:                 data: json,
  10:                 contentType: 'application/json; charset=utf-8',
  11:                 success: function (data) {
  12:                     var replyText = JSON.stringify(data);
  13:                     $("textarea#myResponse").val(replyText);
  14:                 }
  15:             });
  16:         };
  17:     </script>

The special sauce here is the dataType and contentType together with the POST method. The rest is pretty much how jQuery wants to be called.

On line 6 you will note that I'm POSTing to the same URL as the browser is on – you may want to fiddle with that.

We  call this function by hooking up a button, something like:

   1: $("button#myButtonId").click('myTextAreaWithJsonContent', postToAction);

In your own implementation you will no doubt compose some object in JavaScript, or create a Json object. In the code above, I'm simply using Json formatted string: Line 3 gets the value from a text area element. If you compose a complex object and want to send it, you will need to convert it into a Json string. jQuery 1.5.1 – which I happen to use – already contains this facility, JSON.stringify(x) would do the trick. You can see it here used  on line 12, for the inbound response which I simply stringify and hydrate a response text area. But this is only demo code – you will wire up the success (and failure function- right?) to your own logic.

But back to the core issue: Yes, MVC supports submitting Json and hydrating my controller action, but how do I persuade it to use my custom chosen super duper serializer rather than the stock one?

The answer is: Create a custom ValueProviderFactory and instantiate your favorite serializer in there. A bit of inspection on the MVC source code on CodePlex.com reveals the stock implementation.

Here's a modified version, which isolates the serialization in a clear way:

   1: public class MyJsonValueProviderFactory : ValueProviderFactory
   2: {
   3:     public override IValueProvider GetValueProvider(ControllerContext controllerContext)
   4:     {
   5:         if (controllerContext == null)
   6:         {
   7:             throw new ArgumentNullException("controllerContext");
   8:         }
   9:  
  10:         if (!controllerContext.HttpContext.Request.ContentType.StartsWith(
  11:                 "application/json", StringComparison.OrdinalIgnoreCase))
  12:         {
  13:             return null;
  14:         }
  15:  
  16:  
  17:         object value = Deserialize(controllerContext.RequestContext.HttpContext.Request.InputStream);
  18:  
  19:         if (value == null)
  20:         {
  21:             return null;
  22:         }
  23:  
  24:         var bag = new Dictionary<string, object>(StringComparer.OrdinalIgnoreCase);
  25:  
  26:         PopulateBag(bag, string.Empty, value);
  27:  
  28:         return new DictionaryValueProvider<object>(bag, CultureInfo.CurrentCulture);
  29:     }
  30:  
  31:     private static object Deserialize(Stream stream)
  32:     {
  33:         string str = new StreamReader(stream).ReadToEnd();
  34:  
  35:         if (string.IsNullOrEmpty(str))
  36:         {
  37:             return null;
  38:         }
  39:  
  40:         var serializer = new JavaScriptSerializer(new MySpecialTypeResolver());
  41:  
  42:         return serializer.DeserializeObject(str);
  43:     }
  44:  
  45:     private static void PopulateBag(Dictionary<string, object> bag, string prefix, object source)
  46:     {
  47:         var dictionary = source as IDictionary<string, object>;
  48:         if (dictionary != null)
  49:         {
  50:             foreach (var entry in dictionary)
  51:             {
  52:                 PopulateBag(bag, CreatePropertyPrefix(prefix, entry.Key), entry.Value);
  53:             }
  54:         }
  55:         else
  56:         {
  57:             var list = source as IList;
  58:             if (list != null)
  59:             {
  60:                 for (int i = 0; i < list.Count; i++)
  61:                 {
  62:                     PopulateBag(bag, CreateArrayItemPrefix(prefix, i), list[i]);
  63:                 }
  64:             }
  65:             else
  66:             {
  67:                 bag[prefix] = source;
  68:             }
  69:         }
  70:     }
  71:  
  72:     private static string CreatePropertyPrefix(string prefix, string propertyName)
  73:     {
  74:         if (!string.IsNullOrEmpty(prefix))
  75:         {
  76:             return (prefix + "." + propertyName);
  77:         }
  78:         return propertyName;
  79:     }
  80:  
  81:     private static string CreateArrayItemPrefix(string prefix, int index)
  82:     {
  83:         return (prefix + "[" + index.ToString(CultureInfo.InvariantCulture) + "]");
  84:     }
  85: }

It all really boils down to the same ceremony as the default implementation, except on line 40 and 42 we now get to use our own special serializer. Woot!

To use this instead of the built in one, you will modify your global.asax.cs Application_Start() to include something like

   1: var existing = ValueProviderFactories.Factories.FirstOrDefault(f => f is JsonValueProviderFactory);
   2: if (existing != null)
   3: {
   4:     ValueProviderFactories.Factories.Remove(existing);
   5: }
   6: ValueProviderFactories.Factories.Add(new MyJsonValueProviderFactory());

where the built in one gets removed and my custom one gets added. Pretty straightforward.

With this technique and the one described in my previous post, you are ready to use the full power of MVC as an API supporting a nice strongly typed parameter for the back end developer and supporting fully customizable JSON in and out of methods. No real need for other frameworks or technologies for the serving jQuery needs.

Depending on your methods, you may even get away with one set of actions serving both form posts and jQuery invocation.

Happy coding!

Tags: , | Categories: Code Development, MongoDB, NoSQL Posted by nurih on 1/11/2012 6:28 PM | Comments (0)

Oh, I remember it like it was yesterday: Dot-com, no DBA, late nights, a pale developer walking into my office head hung low and mumbling something about restoring backup because he ran an update query with no where clause.. the good old days.

Zoom forward a decade.

A developer is cranking away on some MongoDB query generation in code. Not wanting to make any rookie mistakes, she uses the Query.EQ builder, which would surely create the correct syntax. Better than her own string.Format() at least!

Or so she thought.

The issue is a small peculiarity with the Query methods. The formal signature takes a string and a BsonValue:

   1: public static QueryComplete EQ(
   2:     string name,
   3:     BsonValue value
   4: )

So a some call path would get to a method which would create the query. Lets say

   1: public string CreateFooQuery(string value)
   2: {
   3:     var query = Query.EQ("Prop1",value);
   4:     return query.ToString();
   5: }

 

Did you spot the issue? I didn't. She didn't either.

The issue is that the Query.EQ method would boil down a value "xyz" to the MongoDB query {"Prop1":"xyz"} just fine. It would also boil down an empty string "" to the query {"Prop1":""} without flinching. But if you supply null – and yes, strings can be null – then the query becomes {}. Yes, that's right, the null query. Yes, the one that matches EVERY document!

Now take that query as a read query – you got all documents returned. Use it for an update – might as well have some good backup strategy!

Note to self: write more unit tests. Ensure that every code path you visit have been visited before and the expected values are returned.

Sigh!

Tags: , , , | Categories: Code Development, MVC, Web, JSON, Serialization Posted by nurih on 1/11/2012 6:10 PM | Comments (0)

For various reasons you may find that the default JsonResult returned by invoking the controller method such as

return Json([data);

Is unsuitable for the consumer. The main issue most often encountered is that this method uses the JsonResult which in turn uses the JavaScriptSerializer with no access to the JavaScriptTypeResolver.

This means that you can provide that serializer a parameter specifying you own custom type resolver.

Other issues, such as maximum recursion depth and maximum length and type resolvers can be simply configured in web.config. See  Configuring JSON Serialization section on MSDN.

Back to the core problem though.

To override the JsonResult, we would need to do 2 things:

1) Create our own custom JsonResult implementation

2) Tell the controller to use ours rather than the stock one.

A new JsonResult is needed because the base one hard codes the construction of the JavaScriptSerializer.

So here we go. Some CTRL+C, CTRL+V later from the open source MVC on Codeplex gives us

   1: public class TypedJsonResult : JsonResult
   2: {
   3:     public override void ExecuteResult(ControllerContext context)
   4:     {
   5:         if (context == null)
   6:         {
   7:             throw new ArgumentNullException("context");
   8:         }
   9:  
  10:         if ((JsonRequestBehavior == JsonRequestBehavior.DenyGet)
  11:             && string.Equals(context.HttpContext.Request.HttpMethod, "GET", StringComparison.OrdinalIgnoreCase))
  12:         {
  13:             throw new InvalidOperationException("JsonRequest GetNotAllowed");
  14:         }
  15:  
  16:         var response = context.HttpContext.Response;
  17:  
  18:         response.ContentType = !string.IsNullOrEmpty(ContentType) ? ContentType : "application/json";
  19:  
  20:         if (ContentEncoding != null)
  21:         {
  22:             response.ContentEncoding = ContentEncoding;
  23:         }
  24:  
  25:         if (Data != null)
  26:         {
  27:             var serializer = new JavaScriptSerializer(new BiasedTypeResolver());
  28:             response.Write(serializer.Serialize(Data));
  29:         }
  30:     }
  31: }

You will note on line 27 that we're still using the JavaScriptSerializer, but this time we're controlling its construction and decided to give it our own type resolver. More on that type resolver in a bit.

Next, we want to our controller(s) an easy way to choose our TypedJsonResult rather than the stock one. Luckily, the controller boils down the call Json(data) and several other signatures to a call to a virtual signature which we may simply override, like so:

   1: protected override  JsonResult Json(object data, string contentType, Encoding contentEncoding, JsonRequestBehavior behavior)
   2: {
   3:     return new TypedJsonResult { Data = data, ContentType = contentType, ContentEncoding = contentEncoding, JsonRequestBehavior = behavior };
   4: }

That's it! On line 3 you will notice we return our custom TypedJsonResult, whereas the stock implementation would have returned the JsonResult.

If this new behavior is desired everywhere, then you would probably want to place this override in a base controller of your own, and have all your controllers inherit it.

Other powers this bestows on you is of course using some other serializer altogether, such as Json.NET or whatever you fancy.

Now back to my own type resolver. You could, after all, use the SimpleTypeResolver built into the framework, which works quite well. However, it introduces fairly long type names – frowned upon by my clients consuming this Json on other platforms, and also doesn't enable me to map my own type names to a type of my choice. Enter the BiasedTypeResolver.

   1: private static readonly Dictionary<string, Type> _KnownTypes;
   2:  
   3: static BiasedTypeResolver()
   4: {
   5:     _KnownTypes = new Dictionary<string, Type> ();
   6:     var appDomain = AppDomain.CurrentDomain;
   7:     foreach (var assembly in appDomain.GetAssemblies().Where(a => a.GetName().Name.EndsWith(".ValueObjects")))
   8:     {
   9:         foreach (var type in assembly.GetTypes().Where(t => !t.IsInterface && !t.IsAbstract))
  10:         {
  11:             _KnownTypes[type.Name] = type;
  12:         }
  13:     }
  14: }
  15:  
  16: public override Type ResolveType(string id)
  17: {
  18:     var result = Type.GetType(id);
  19:     if (result != null || _KnownTypes.TryGetValue(id, out result))
  20:     {
  21:         return result;
  22:     }
  23:     throw new ArgumentException("Unable to resolve [" + id + "]", "id");
  24: }
  25:  
  26: public override string ResolveTypeId(Type type)
  27: {
  28:     if (type == null)
  29:     {
  30:         throw new ArgumentNullException("type");
  31:     }
  32:  
  33:     return type.Name;
  34: }

This resolver spelunks specific assemblies only (those named [whatever].ValueObjects which are my naming convention for POCO public objects) and catalogs them into a dictionary by short type name.

It doesn't know how to resolve 2 types of the same name if they only differ by namespace, but then again I'd ask "how come you would define 2 classes of the same exact name in your solution?" You can define type names to whatever suites your needs.

The resolver's responsibility is twofold: Given a string , return the System.Type that corresponds to it. Given a type, return a name for it. The former is used during deserialization, the latter when serializing.

Now you may not need or like this particular type resolver implementation. But now that you can inject your own, possibilities are limitless. These can range to configuration based type resolution, versioning decisions base on some revision upgrades etc.

Note also that upon deserialization, the type resolver is only called when a type discriminator exists in the JSON stream. That is, a complex type that doesn't contain "__type":"foo" will be serialized by the JavaScriptSerializer by matching the target member name rather than the resolver. This is nice because the JSON can contain strategically placed type discriminators for polymorphic reasons on some members, but be left terse and bare otherwise.

Hopefully, this helps you, or me-of-the-future when the gray cells got filled with the next great thing..

Happy Coding!