Friday, November 1, 2013

Azure and the cloud in the real world


So, I haven’t blogged in a really long time. Quite frankly, when you’re working with LOB apps for years it gets kind of stale. You can only figure out so many techniques to do the same set of things and most of those I get from other people’s blogs.
I’m working on a new project right now that required some new approaches, some new architecture, some new technologies. I’m developing a site that’s going to need to scale potentially massively and rapidly so I started my first serious look into Microsoft Azure.
I probably only know enough to be dangerous at this point but so far I’ve had a pretty pleasant experience with Azure. I think the tutorials provided by Microsoft and articles I see on the internets are really well done in explaining the “HOWTO” type of content. What I do see as vague to myself at first, and many others is the “why/where” type of questions. Why should I use Azure Table Storage, when does it make sense, when should I prefer Azure SQL Storage, etc.

My answer was both and at the same time. The site I’m developing will have a search engine side of things as well as a social media aspect to it. If you think developing a social media site of any kind lends itself well to a SQL database approach then by all means stick with that approach but you’re likely a stronger SQL person than I. My head starts to swim a little when I consider schema changes required by the fluidity of a social media paradigm over time. Not to mention just the sheer weight and intensity of wrapping my head around a simple “like” button that could be used for what would be rows in 1000 different tables. Think about all of the different “kinds” of things one can “like” or “+1” and you’ll probably come to a similar conclusion, SQL is doable, but man would it take a long time to get perfect. It would also bog down in the myriad of challenges you’re bound to face implementing new features, etc.

I do think Azure SQL Storage makes great sense for the things I want to search on vs. Azure Table Storage. In playing with prototyping the Table Storage stuff, it’s just not really designed for performant searching. What it does do really well however, is abstract the concerns of an ever shifting schema. Oh, there are still concerns, especially when it comes to changing your object models and TableEntities but nothing compared to updating a SQL database.

Working with Azure in general made me rethink my architecture approach completely. Not for the better I can assure you but an architecture that suits my needs for rapid development of a website with the promise of some help down the road to make it a more distributed and proper architecture approach. One of the things Azure liberates me from in the short term is concern of how my application will scale due to my architecture choices. At least as a start-up where choosing to scale our application simply by creating another instance of the entire website is a simple and valid approach.

So, I created a simple approach: my entire work flow would be based on an abused UnitOfWork pattern. Those familiar typically know this as a database centric design pattern but the actual pattern is generic of course. I expanded my UnitOfWork to contain everything I needed to know from the moment a user interacts with the website, throughout the full round-trip of the interaction. This website is in C#/Asp.Net MVC and starts a UOW from the controller action and maintains that object through the lifecycle of the page request right up into the view if needed (although usually I do create a proper POCO view model with simple primitive properties.

So what’s this thing look like?


    public interface IUnitOfWork : IDisposable, ICommunicationUnit
    {
        object LookupId { get; set; }

        ILoggingHandler LoggingHandler { get; set; }

        Controller Controller { get; set; }

        WebUserViewModel CurrentWebUser { get; set; }

        IFabnuDataContext DataContext { get; set; }

        bool IsStartValid { get; set; }

        void SaveMessagesToTempData();
        void LoadMessagesFromTempData();
        void CopyWarningsToModelState();

        ActionResult OverrideResult { get; set; }
    }

    public interface IUnitOfWork<TViewModel> : IUnitOfWork
    {
        TViewModel ViewModel { get; set; }

        bool Validate(AbstractValidator<TViewModel> validator);
    }

So, I know what you’re thinking, that’s crazy talk, and you’re right, partially. Consider this however. Yes, I’m taking the controller in its entirety end to end and, for now, the entire web framework is baked in down to the repository and vice-versa. There will be no easy separation of code, there will be overloaded objects with far too many responsibilities, etc. I’m already starting to see some of that encroach but I’m willing to let it stand for today.

What I am doing however is separating my code into logical layers just as if I would if this were a true n-Tier approach. Although once scaled properly into the Azure world I will not have access to a controller in the entire codebase, I will have access to all of the Azure based tiers. What that means is, all of my code designed to access Azure Table Storage, Azure SQL, Azure Blobs, etc. will still work perfectly well once I move chunks of my logical “service tier” into the cloud as worker roles. Yes, I will have to remove any code dealing with controllers directly but that’s been minimal so far and is acceptable loss to me for the time being.

The reason this approach would never make sense for anything but a small website of course is that I couldn’t reliably scale this architecture before. Azure allows me to simply duplicate my website as if it were an overblown web role while I get the time to really harness the code to Azure and split each service class into its own worker role and each controller background into its own web role that could be scaled more effectively in both cost and code concerns.

Right now I piggy-back an ICommunicationUnit interface between my UOW and my view models. Because there is only a single unit of work, a single DB connection throughout the entire request, a single controller/HTTP request instance I’m able to make a lot of shortcuts otherwise impossible. My data context has access to the HTTP cache and can update the cache when you save something to the database reducing the overhead and complexity of keeping the cache from going stale. I can also save a portion of my ViewModel properties off to Table Storage while saving other properties off to the SQL database. This may sound incredibly error prone, and I think it absolutely would be, except I’ve implemented a simple XML file –> T4 template code generation DSL that allows me to modify a few XML files and dictate which properties of the composite view model are saved to SQL, which are saved to Azure storage and which are just in-process viewmodel properties used to communicate through the entire application. 

I collect all of my errors and warning in a simple domain specific list of message POCOs containing some basic info like the level of the message: msg|warning|error. Warnings in my project are things that end up in the MVC ModelState that get propagated back to the user such as “Password field requires 6 letters” while errors contain exception messages and stack traces. At the conclusion of the unit of work, when I’m about to return a view back to the consumer I simply iterate my message list built up through all of the tiers and shove any errors into Table Storage with a partition key = today’s date and a RowKey of a generated GUID. Inside the error table entity I store the UserAgentString, what Url and UrlReferrer the user was on when the exception occurred and any other useful info available to me with the entire web stack at my fingertips. Errors/Exceptions are a great thing to put in storage, what do I care if the schema changes over time?

Other things that lend themselves well to Table Storage are user profiles, info that changes schema over time. I add a new property to my XML file, flag it as “TableStorage” and my ViewModel and TableEntity get updated with the new property and I let AutoMapper handle the rest. My models change, my code to load/save to entity storage does not.

I’m very keen on seeing how this all plays out as this application truly needs to scale. I realize this article is a bit nebulous but I thought I’d throw out my n00b approach to Azure and what’s available on the stack right now. To me, working with the code about a solid month into development now, it’s panning out nicely. I have yet to start gophering off into distinct web roles but I’m not scared. The lion’s share of code is talking to the Azure tiers themselves, the web stack code isn’t going to be difficult to segregate. The Azure SQL Database and Table Storage will always be available to all of my tiers moving forward, my UOW will certainly change over time but all in a centralized manageable core. I can pick and choose moving forward what codebases need to be optimized and scaled. The first likely candidate would be the authentication and authorization code blocks as they’re prone to get hit the hardest and most often in general.

Once I get to the point I have that tucked away in a service tier truly communicating via REST with the rest of the architecture I’ll be happy to report if I was right, or… wrong… Another thing I’m yearning to see play out is moving the entire ICommunicationUnit paradigm into the Azure Queue approach. No more worrying about how to communicate from controller to service to data and back out. I’m not saying there aren't better approaches certainly but the ability to connect to a queue system and pick that message back up later based on logged in user, or area of code, or basically any criteria you want: powerful stuff.