Visual Studio SP1 and The Metification of REST

Metification – verb

  1. The act of adding metadata to a web service in order to facilitate tooling and discovery.
  2. The act of adding complexity to a web service in order to achieve tight coupling.

Pick one.

Service Pack 1 for Visual Studio 2008 has just arrived with new features, including version 1.0 of ADO.NET Data Services (a.k.a Astoria). From the description (highlighting is mine):

ADO.NET Data Services … consists of a combination of patterns and libraries that enables any data store to be exposed as a flexible data service, naturally integrating with the Web, that can be consumed by Web clients within a corporate network or across the Internet. ADO.NET Data Services uses URIs to point to pieces of data and simple, well-known formats to represent that data, such as JSON and ATOM/APP. This results in data being exposed to Web clients as a REST-style resource collection, addressable with URIs that agents can interact with using standard HTTP verbs such as GET, POST, or DELETE.

Compared to the traditional SOAP approach, the REST-style is a different model for exposing functionality over a web service. Instead of defining messages and exposing operations that act on those messages, you expose resources and act on the resources using common HTTP verbs. I’ve lately been thinking of SOAP based web services as “verb oriented” (exposing GetOrder and UpdateCustomer), while REST style web services are “noun oriented” (exposing Orders and Customers). Both models have advantages and disadvantages, but I’ve felt that REST partners well with rich, Internet applications that need to retrieve a variety of resources  using the same filtering and paging parameters. Creating a heap of GetThisByThat operations is tedious. 

Noun and verbs aren’t the only difference between REST and SOAP. One of the primary strengths of REST is its inherent simplicity. The simplicity not only facilitates broad interoperability, but encourages an acceptance of REST from many who feel overwhelmed by the complexities of WS-*. There are no tools required for REST - all you need is the ability to send an HTTP request and read the response. WS-*, on the other hand, is great when you need a digitally signed message including double-secret user credentials routed through an asynchronous and distributed, two-phase commit transaction with an extended buyer protection. Not everyone needs that flexibility, but you still pay the price for the flexibility when using the tooling and the API, and when configuring the service.

Although we could continue talking about differences in REST and SOAP, I wanted to talk about metadata, and Astoria.

Metafication

REST proponents, as a rule of thumb, shun metadata – but not all forms of metadata. Metadata in prose or written documentation is fine. Metadata in a self-describing response format is fine. However, metadata for tooling is seen by many as pure evil. Part of the complexity in WS-* is in the quirky and convoluted folds of metadata formats like WSDL and XML Schema. REST has seen some attempts at standardized metadata (WADL, WSDL 2.0, XSD), but still resists all attempts for the most part. 

I like metadata. Maybe I’ve been in the .NET ecosystem for so long that I expect tooling, but I still remember the first time I tried to write a program for the Flickr web service (which is technically just POX). I was shocked when I coudn’t find a WSDL file. Then I was surprised at how easy it was to craft the correct URL for an HTTP request, and shred apart the XML response to find photographs. It was so easy that ... well, it was just too easy. It reminded me of writing data access code from scratch. Data access code is so predictable and repetitive that we have tools, frameworks, and code generators to take care of the job. But those tools, frameworks, and code generators rely on metadata defined by a database schema, so their job is relatively straightforward. REST is a bit different, unless you are working with Astoria on the server and a CLR client.

Let’s say you have some DTOs for employees, orders, and other objects you want to send over the wire. You’ll need to decorate them with enough information for the service to understand the primary key.

[DataServiceKey("ID")]
public class Employee
{

public int ID { get; set; }
public string Name { get; set; }
}

[DataServiceKey("ID")]
public class Order
{ // …
}

Next, define a class with public IQueryable<T> properties for each “entity set” (Employees and Orders). IQueryable<T> is easy to conjure up, and the class below represents a read-only data source with some fake in-memory data. If you need create, update, and delete functionality the class will need to implement IUpdateable, too. Sean Wildermuth has a three series blog post about IUpdateable that he wrote when implementing IUpdateable for the NHibernate LINQ project.

public class AcmeData 
{    
public
IQueryable<Employee> Employees
{
get
{ return new List<Employee>
{
new Employee() /* ... */,
new Employee() /* ... */,
new Employee() /* ... */
// ...
}.AsQueryable();
}
}

public
IQueryable<Order> Orders
{
// ...
}
// ...
}

Then you need an .svc file…

<%@ ServiceHost Language="C#" 
Factory="System.Data.Services.DataServiceHostFactory,
System.Data.Services,
Version=3.5.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089
"
Service="AcmeDataService" %>

… and you’ll also need a code-behind file for the .svc (which is all setup for you using an ADO.NET data service template, you just add some configuration):

public class AcmeDataService : DataService<AcmeData>
{
public static void InitializeService(IDataServiceConfiguration config)
{
config.SetEntitySetAccessRule("Employees", EntitySetRights.AllRead);
// ... more rules
}
}

At this point you can start testing the service using a web browser and looking at, for example, http://localhost/AcmeDataService.svc/Employees. What is more interesting is looking at http://localhost/AcmeDataService.svc/$metadata, because there you’ll find service metadata, which is where the magic starts.

To consume the service, right-click on a project in Visual Studio and select “Add Service Reference…”. Yes – the same “Add Service Reference” command you might have seen in the hit motion picture “SOAP and WSDL – an XML Love Story”. This feature blurs the lines between REST and WS-*. Enter the root URL to the service and Visual Studio will generate a proxy – but not the type of proxy you receive when using SOAP based web services. This proxy will derive from DataServiceContext class and you can use it like so:

var employees = new AcmeData(serviceRoot)
.Employees
.Where(e => e.Name == "Scott")
.OrderBy(e => e.Name)
.Skip(2)
.Take(2)
.ToList();

DataServiceContext does a little bit of magic to turn the LINQ query into the following HTTP request. It’s LINQ to REST:

GET /AcmeDataService.svc/Employees()
?$filter=Name%20eq%20'Scott'&$orderby=Name&$skip=2&$top=2 HTTP/1.1
User-Agent: Microsoft ADO.NET Data Services
Accept: application/atom+xml,application/xml

The data service will respond with some XML that the data context uses to create objects that look just like the server side DTOs.

I’m sure some are horrified at this metification of REST, but for scenarios when you need to talk between two CLR appdomains (think ASP.NET and Silverlight), this approach gives you the advantages of thinking about nouns in a RESTful model without writing all the glue code to wire up an endpoints and parse XML. Beauty!

posted by scott with 5 Comments

Optimizing LINQ Queries

I’ve been asked a few times about how to optimize LINQ code. The first step in optimizing LINQ code is to take some measurements and make sure you really have a problem. 

premature

 

It turns out that optimizing LINQ code isn’t that different from optimizing regular C# code. You need to form a hypothesis, make changes, and measure, measure, measure every step of the way. Measurement is important, because sometimes the changes you need to make are not intuitive.

Here is a specific example using LINQ to Objects.

Let’s say we have 100,000 of these in memory:

public class CensusRecord
{
public string District{ get; set; }
public long Males { get; set; }
public long Females { get; set; }
}

We need a query that will give us back a list of districts ordered by their male / female population ratio, and include the ratio in the query result. A first attempt might look like this:

var query =
from r in _censusRecords
orderby (double)r.Males / (double)r.Females descending
select new
{
District = r.District,
Ratio = (double)r.Males / (double)r.Females
};

query = query.ToList();

It’s tempting to look at the query and think - “If we only calculate the ratio once, we can make the query faster and more readable! A win-win!”. We do this by introducing a new range variable with the let clause:

var query =
from r in _censusRecords
let ratio = (double)r.Males / (double)r.Females orderby ratio descending
select new
{
District = r.District,
Ratio = ratio
};

query = query.ToList();

If you measure the execution time of each query on 100,000 objects, however, you’ll find the second query is about 14% slower than the first query, despite the fact that we are only calculating the ratio once. Surprising! See why we need to take measurements?

Look At Time and Space

The key to this specific issue is understanding how the C# compiler introduces the range variable ratio into the query processing. We know that C# translates declarative queries into a series of method calls. Imagine the method calls forming a pipeline for pumping objects. The first query we wrote would translate into the following:

var query =
_censusRecords.OrderByDescending(r => (double)r.Males /
(double)r.Females)
.Select(r => new { District = r.District,
Ratio = (double)r.Males /
(double)r.Females });

The second query, the one with the let clause, is asking LINQ to pass an additional piece of state through the object pipeline. In other words, we need to pump both a CensusRecord object and a double value (the ratio) into the OrderByDescending and Select methods. There is no magic involved - the only way to get both pieces of data through the pipeline is to instantiate a new object that will carry both pieces of data. When C# is done translating the second query, the result looks like this:

var query =
_censusRecords.Select(r => new { Record = r,
Ratio = (double)r.Males /
(double)r.Females })
.OrderByDescending(r => r.Ratio)
.Select(r => new { District = r.Record.District,
Ratio = r.Ratio });

clr profiler results

The above query requires two projections, which is 200,000 object instantiations.  CLR Profiler says the let version of the query uses 60% more memory.

Now we have a better idea why performance decreased, and we can try a different optimization. We’ll write the query using method calls instead of a declarative syntax, and do a projection into the type we need first, and then order the objects.

var query =
_censusRecords.Select(r => new { District = r.District,
Ratio = (double)r.Males /
(double)r.Females })
.OrderByDescending(r => r.Ratio);

This query will perform about 6% faster than the first query in the post, but consistently (and mysteriously) uses 5% more memory. Ah, tradeoffs.

Moral Of The Story?

The moral of the story is not to rewrite all your LINQ queries to save a 5 milliseconds here and there. The first priority is always to build working, maintainable software. The moral of the story is that LINQ, like any technology, requires analysis and measurements to make optimization gains because the path to better performance isn’t always obvious. Also remember that a query “optimized” for LINQ to Objects might make things worse when the same query uses a different provider, like LINQ to SQL.

posted by scott with 7 Comments

Using an ORM? Think Objects!

I recently had some time on airplanes to read through Bitter EJB, POJOs in Action, and  Better, Faster, Lighter Java. All three books were good, but the last one was my favorite, and was recommended to me by Ian Cooper. No, I’m not planning on trading in assemblies for jar files just yet. I read the books to get some insight and perspectives into specific trends in the Java ecosystem. A Sound Of Thunder

It’s impossible to summarize the books in one paragraph, but I’ll try anyway:

Some Java developers shun the EJB framework so they can focus on objects. Simple objects. Testable objects. Malleable objects. Plain old Java objects that solve business problems without being encumbered by infrastructure and technology concerns.

That’s the gist of the three books in 35 words. The books also talk about patterns, anti-patterns, domain driven design, lightweight frameworks, processes, and generally how to  write software. You’d be surprised how much content is applicable to .NET. In fact, when reading through the books I began to think of .NET and Java as two parallel universes whose deviations could be explained by the accidental killing of one butterfly during a time traveling safari.

The focus of this post is one particular deviation that really stood out.

From Objects To ORMs

The Java developers who focus on objects eventually have to deal with other concerns like persistence. Their  object focus naturally leads some of them to try object-relational mapping frameworks. ORMs like Hibernate not only provide these developers with productivity gains, but do so in a relatively transparent and non-intrusive manner. The two work well together right from the start as the developers understand the ORMs, and the ORMs seem to understand the developers.

From DataSets to ORMs

.NET includes includes DataSets, DataTables, and DataViews. There is an IDE with a Data menu, and a GUI toolbox with Data tab full of Data controls and DataSources. It’s easy to stereotype mainstream .NET development as data-centric. When you introduce an ORM to a .NET developer who has never seen one, the typical questions are along the lines of:

How do I manage my identity values after an INSERT?

... and ...

Does this thing work with stored procedures?

Perfectly reasonable questions given the data-centric atmosphere of .NET, but you can almost feel the tension in these questions. And that is the deviation that stood out to me. On the airplane, I read about Java developers who focused on objects and went in search of ORMs. In .NET land, I’m seeing the ORMs going in search of the developer who is focused on data. The ORMs in particular are LINQ to SQL (currently shipping in Visual Studio) and the Entity Framework (shipping in SP1). Anyone expecting something like “ADO.NET 3.5” is in for a surprise. Persistent entities and DataSets are two different creatures, and require two different mind sets.

Will .NET Developers Focus On Objects Now?

It’s possible, but the tools make it difficult. The Entity Framework, for instance, presents developers with cognitive dissonance at several points. The documentation will tell you the goal of EF is to create a rich, conceptual object model, but the press releases proclaim that the Entity Framework simplifies data-centric development.  There will not be any plain old CLR objects (POCOs) in EF, and the object-focused implicit lazy-loading that comes standard in most ORMs isn’t available (you can read any property on this entity, um, except that one – you’ll have to load it first).

LINQ to SQL is different. LINQ to SQL is objects all the way down. You can use plain old CLR objects with LINQ to SQL if you dig beyond the surface. However, the surface is a shiny designer that looks just like the typed DataSet designer. LINQ to SQL also needs some additional mapping flexibility to truly separate the object  model from the underlying database schema – hopefully we’ll see this in the next version.

What To Do?

If you are a .NET developer who is starting to use an ORM –any ORM, you owe it to yourself and your project to reset your defaults and think differently about the new paradigm. Forget what you know about DataSets and learn about the unit of work pattern. Forget what you know about data readers and learn how an ORM identity map works. Think objects first, data second. If you can’t think of data second, an ORM might not be the technology for you. 

posted by scott with 10 Comments

LINQ Deep Dive at D.C. ALT.NET Next Week

Matt Podwysocki invited me to speak at the D.C. alt.net meeting next Thursday evening (July 24th). The topic is LINQ. Matt specifically requested a code-heavy presentation, so expect two slides followed by plenty of hot lambda and Expression<T> action.

Hopefully, Matt doesn’t blackout the neighborhood like he did at the nearby RockNUG meeting this week. The White House is two blocks away and the people inside get a little jumpy about blackouts.

 

DateTime:
7/24/2008 - 7PM-9PM

Location:
Cynergy Systems Inc.
1600 K St NW
Suite 300
Washington, DC 20006
Show Map

posted by scott with 7 Comments

Keeping LINQ Code Healthy

In the BI space I’ve seen a lot of SQL queries succumb to complexity. A data extraction query adds some joins, then some filters, then some nested SELET statements, and it becomes an unhealthy mess in short order. It’s unfortunate, but standard SQL just isn’t a language geared for refactoring towards simplification (although UDFs and CTEs in T-SQL have helped).

I’ve really enjoyed writing LINQ queries this year, and I’ve found them easy to keep pretty.

For example, suppose you need to parse some values out of the following XML:

<ROOT>
<
data>
<
record>
<
field name="Country">Afghanistan</field>
<
field name="Year">1993</field>
<
field name="Value">16870000</field>
<!--
... -->
</
record>
<!--
... -->
</
data>
</
ROOT>

A first crack might look like the following:

var entries =
from r in doc.Descendants("record")
select new
{
Country = r.Elements("field")
.Where(f => f.Attribute("name") .Value == "Country")
.First().Value,
Year = r.Elements("field")
.Where(f => f.Attribute("name").Value == "Year")
.First().Value,
Value = double.Parse
(r.Elements("field")
.Where(f => f.Attribute("name").Value == "Value")
.First().Value)
};

The above is just a mass of method calls and string literals. But, add in a quick helper or extension method…

public static XElement Field(this XElement element, string name)
{
return element.Elements("field")
.Where(f => f.Attribute("name").Value == name)
.First();
}

… and you can quickly turn the query around into something readable.

var entries =
from r in doc.Descendants("record")
select new
{
Country = r.Field("Country").Value,
Year = r.Field("Year").Value,
Value = double.Parse(r.Field("Value").Value)
};

If only SQL code was just as easy to break apart!

posted by scott with 2 Comments

Restku

Haiku is a popular poetic form that has evolved over centuries. Restku is Haiku with a  twist.

crystal pixels
get brighter
an abundance of excitement

The twist is that the author of a Restku is restricted to using a single verb from this list: get, post, put, and delete. Although traditional Restku insists on present tense usage of the four verbs, adventurous  authors will mix in past tense, future tense, and on occasion, present perfect tense.

unexpected dialog
a “progress” bar
vista has posted the bad news

Although Restku was inspired by REST, a software architecture style,  there is no reason an author can’t frame concepts from outside the world of information technology into a Restku.

weathered glove
humid skies
put on a childhood dream

Relax your mind with the mental stimulation of writing a Restku today, for tomorrow is still a mystery.

four hundred and four
electrical neurons
delete her memory
posted by scott with 2 Comments

Herding Code

herdingcode Herding Code is a podcast about a variety of topics in technology and software development. It’s done roundtable style with myself, Scott Koon, Kevin Dente, and Jon Galloway. The conversations are a blast, and I hope informative, too.

Tune in to the feed here: http://feeds.feedburner.com/HerdingCode

posted by scott with 2 Comments

Swimming Upstream Is Hazardous

Salmon swim upstream, and look at what happens …

    

salmon

Every developer is familiar with the “work around”. These are the extra bits of extra code we write to overcome limitations in an API, platform, or framework.

But, sometimes those limitations are a feature. The designer of a framework might be guiding you in a specific direction. Take the Silverlight networking APIs as an example. The APIs provide only asynchronous communication options, yet I’ve seen a few people try to block on network operations with code like the following:

AutoResetEvent _event = new AutoResetEvent(false);
WebClient client = new WebClient();
client.DownloadStringCompleted +=
(s, ev) => { _message.Text = ev.Result; _event.Set(); };
client.DownloadStringAsync(new Uri("foo.xml", UriKind.Relative));
_event.WaitOne();

This code results in a deadlock, since the WebClient tries to raise the completed event on the main thread, but the main thread is blocked inside WaitOne and waiting for the completed event to fire. This deadlock is not only fatal to the Silverlight application, but can bring down the web browser, too. Even if this code didn't create a deadlock, do you really want your application to block over a slow network connection?

When you find yourself writing “work around” code, it’s worthwhile to review the situation. Are you really working around a limitation? Or are you working against the intended use of a framework? Working against the framework is rarely a good idea – there can be a lot of hungry bears waiting to catch you in the future.

posted by scott with 3 Comments

Pluralsight 2.0

Pluralsight has a new website, and the new site includes some online training options! See Fritz’s post for more details. Be sure to check out one of the newest classes - the LINQ Fundamentals course, too.
posted by scott with 2 Comments

Rob's Not So Lazy MVC Storefront

Rob ran into some lazy load problems in his MVC Storefront and later proclaimed:

"…if you set any Enumerable anything as a property, it's Count property will be accessed when you load the parent object. This negates using any deferred loading for any POCOs, period"

Rob thought this was a problem with .NET in general, but I was suspicious. Veeery suspicious. I downloaded Rob's latest bits and found some interesting behavior.

Based on the screen shot of the call stack that Rob posted, it appeared LINQ to SQL was doing some type conversions. If you poke around the classes mentioned in the call stack, you'll eventually wander into a GenerateConvertToType method that uses LCG to build dynamic methods. Just based on the opening conditional logic, I thought Rob might solve his problem by using LazyList<T> for his business object properties, too (whether or nor he'd want to is a different question), so I modified his Category class for a few experiments to see what would really lazy load.

public class Category {

    
// rob's original
    public IList<Product> Products { get; set; }
    
    
// experimental
    public LazyList<Product> ProductsLazy { get; set; }
    
public IQueryable<Product> ProductsQueryable { get; set; }
    
public IEnumerable<Product> ProductsEnumerable { get; set; }

    
// ...

This was in hopes that LINQ to SQL wouldn't feel compelled to do a conversion via List<T>. I just needed to tweak the query to set all four properties.

var result = from c in db.Categories
            
join cn in culturedName on c.CategoryID equals cn.CategoryID
            
let products = from p in GetProducts()
                 
             join cp in db.Categories_Products
                    
            on p.ID equals cp.ProductID
                      
     where cp.CategoryID == c.CategoryID
                
            select p
             select new Category
             {
                 ID = c.CategoryID,
                 Name = cn.CategoryName,
                 ParentID = c.ParentID ?? 0,
                 Products =
new LazyList<Product>(products),
                 ProductsQueryable = products,
                 ProductsEnumerable = products.AsEnumerable(),
                 ProductsLazy =
new LazyList<Product>(products)          
             };
             return result;

This experiment failed in a stunning fashion, because none of the Product properties lazy loaded – they all eagerly populated themselves full of real product objects. Hmmm.

Slight Detour

Watching SQL Profiler, I started to wonder why there were soooo many queries running. Sure, the stuff wasn't lazy loading but the queries were flying by quicker than eggs at a Steve Ballmer talk. Yet, the code that was kicking off the whole process was just looking for a single category:

Category result = _repository.GetCategories()
                             .WithCategoryID(id)
                             .SingleOrDefault();

That problem turned out to be in Rob's WithCategoryID extension method.

public static IEnumerable<Category> WithCategoryID(
    
                                 this IEnumerable<Category> qry, int ID) {

    
return from c in qry
          
where c.ID == ID
          
select c;
}

By taking an IEnumerable<T> parameter, the extension method was forcing the query to execute and then doing all the ID checks using LINQ to Objects. Just switching over to IQueryable<T> made the method a lot more efficient, and the number of queries came down tremendously.

Correlating Problems

Back to the original problem, which was a bit of a mystery because I've been able to lazy load collections using IEnumerable<T> and IQueryable<T>. After some more fiddling, I began to suspect the query itself. The query uses a correlated subquery by virtue of the fact that the range variable c is used inside the query for products (c.CategoryID). I'm guessing that LINQ to SQL felt compelled to take care of all the work in one fell swoop. Instead of using a subquery, I presented LINQ to SQL with a method call that pushed the needed parameter (c.CategoryID) onto the stack, and made things slightly more readable in the process.

       var result = from c in db.Categories
                    
  join cn in culturedName
                       
on c.CategoryID equals cn.CategoryID
                   

                    let
products = GetProducts(c.CategoryID)
                   

                    select
new Category
                    {
                        ID = c.CategoryID,
                        Name = cn.CategoryName,
                        ParentID = c.ParentID ?? 0,
                        Products =
new LazyList<Product>(products),
                        ProductsQueryable = products,
                        ProductsEnumerable = products.AsEnumerable(),
                        ProductsLazy =
new LazyList<Product>(products)                    
                    };
      
return result;

   }

  
public IQueryable<Product> GetProducts(int categoryID)
   {
      
var products = from p in GetProducts()
                      
join cp in db.Categories_Products on p.ID equals cp.ProductID
                      
where cp.CategoryID == categoryID
                      
select p;
      
return products;
   }

And voila! Three of the properties (ProductsQueryable, ProductsEnumerable, ProductsLazy) would lazy load their Products from the database. Only the original IList<Product> property would eagerly fetch data. From what I can decipher in the grungy code, when LINQ to SQL sees it needs to assign to an IList<T>, and it doesn't have an IList<T>, it eagerly loads a new List<T> and copies those elements into the destination. At least, that's my theory.

Knowing what I know now, I could tell Rob to stick with IList<T> as his property type, but to make sure he has IList<T> on both sides of the assignment in his projection (and tuck the product query into a method call). In other words, use the following to create the LazyList<T> - LINQ to SQL won't load up Products during some wierd type conversion:

public class LazyList<T> : IList<T> {

   
public static IList<T> Create(IQueryable<T> query)
    {
    
    return new LazyList<T>(query);
    }

    // ...

Conclusion? Beware of mismatched types, particularly with IList<T>, and watch out for eager execution with correlated subqueries.

posted by scott with 3 Comments

Visual Designers Don’t Scale

Microsoft has a long history of being visual. They've made quite a bit of money implementing graphical user interfaces everywhere – from operating system products to database servers, and of course, developer products. What would Visual Studio be if it wasn't visual?

And oh how visual it is! Visual Studio includes a potpourri of visualization tools. There are class diagrams, form designers, data designers, server explorers, schema designers, and more. I want to classify all these visual tools into one of two categories. The first category includes all the visual tools that build user interfaces – the WinForms and WebForms designers, for instance. The second category includes everything else.

Visual tools that fall into the first category, the UI builders, are special because they never need to scale. Nobody is building a Windows app for 5,000 x 5,000 pixel screens. Nobody is building web forms with 5,000 textbox controls. At least I hope not. You can get a pretty good sense of when you are going to overwhelm a user just by looking at the designer screen.

Visual tools that fall into the second category have to cover a wide range of scenarios, and they need to scale. I stumbled across an 8-year-old technical report today entitled "Visual Scalability". The report defines visual scalability as the "capability of visualization tools to display large data sets". Although this report has demographics data in mind, you can also think of large data sets as databases with a large number of tables, or libraries with a large number of classes - these are the datasets that Visual Studio works with, and as the datasets grow, the tools fall down.

Here is an excerpt of a screenshot for an Analysis Services project I had to work with recently:

Here is an excerpt of an Entity Data model screenshot I fiddled with for a medical database:

These are just two samples where the visual tools don't scale and inflict pain. They are difficult to navigate, and impossible to search. The layout algorithms don't function well on these large datasets, and number of mouse clicks required to make simple changes is astronomical. The best you can do is jump into the gnarly XML that hides behind the visual representation.

I'm wondering if the future will see a reversal in the number of visual tools trying to enter our development workflow. Perhaps textual representations, like DSLs in IronRuby, will be the trick.

posted by scott with 23 Comments

The Power of Programming With Attributes

Nothing can compare to the Real Power of programming with attributes. Why, just one pair of square brackets and woosh – my object can be serialize to XML. Woosh – my object can persist to a database table. Woosh – there goes my object over the wire in a digitally signed SOAP payload. One day I expect to see a new item template in Visual Studio – the "Add New All Powerful Attributed Class" template: *

[Table]    
[
DataObject]
[
DataContract]    
[
Serializable]
[
TwoKitchenSinks]      
[
CLSCompliant(true)]        
[
DefaultProperty("Name")]
[
DefaultBindingProperty("Name")]
[
DebuggerStepThroughAttribute]
[
GuidAttribute("F0DD2CAA-2132-11DD-AC50-FE9355D89593")]
public class Person
{
    [
Column]        
    [
DataMember]        
    [
XmlAttribute]
    [
Browsable(true)]
    [
ReadOnly(false)]
    [
Category("Advanced")]
    [
Description("The person's name")]        
    
public string Name { get; set; }

    
// TODO: YOUR INSIGNIFIGANT BIZ LOGIC GOES HERE...
}

Which begs the question – could there ever be a way to separate attributes from the class definition?**

* Put down the flamethrower and step away - I'm kidding.

**This part was a serious question.

posted by scott with 14 Comments

Two LINQ to SQL Myths

LINQ to SQL requires you to start with a database schema.

Not true – you can start with code and create mappings later. In fact, you can write plain-old CLR object like this:

class Movie
{
    
public int ID { get; set; }
    
public string Title { get; set; }
    
public DateTime ReleaseDate { get; set; }
}

… and later either create a mapping file (full of XML like <Table> and <Column>), or decorate the class with mapping attributes (like [Table] and [Column]). You can even use the mapping to create a fresh database schema via the CreateDatabase method of the DataContext class.

LINQ to SQL requires your classes to implement INotifyPropertyChanged and use EntitySet<T> for any associated collections.

Not true, although foregoing either does come with a price. INotifyPropertyChanged allows LINQ to SQL to track changes on your objects. If you don't implement this interface LINQ to SQL can still discover changes for update scenarios, but will take snapshots of all objects, which isn't free. Likewise, EntitySet provides deferred loading and association management for one-to-one and one-to-many relationships between entities. You can build this yourself, but with EntitySet being built on top of IList<T>, you'll probably be recreating the same wheel. There is nothing about EntitySet<T> that ties the class to LINQ to SQL (other than living inside the System.Data.Linq namespace).

LINQ to SQL has limitations and it's a v1 product, but don't think of LINQ to SQL as strictly a drag and drop technology.

posted by scott with 13 Comments

Mr. President the Programmer

Daily Standup Transcription 06 May 2008 1300 Zulu
Time In 00:02:34.66

"… so, yesterday I continued the refactorafication of some classes. The job isn't easy, but I'm going to work hard and continue the collaborativity with my programming partner. Together, we will eliminate the evil of legacy code operating inside the code base.

I know it's been slow going, but we did misundestimerate the threat of static ... static … statictistical dependencies in the code.

Now, if you'll excuse me, I need to get back to work for the great customers of this company."

Time Out 00:02:54.29

posted by scott with 1 Comments

There Is Always Risk In Portability

roll the dice with LINQAfter my last post, someone asked me if the "portable" repository pattern was really a good idea. He was referring to the fact the LINQ queries in the MVC Storefront and Background Motion applications would sometimes execute against in-memory collections (for unit testing), while the rest of the time the queries would execute against a relational database. Isn't there a huge risk in developers not knowing if the software really works with the database?

I don't think of the repository as a "portability" layer, although since it is an abstraction layered on top of the data access code it can provide some nice indirections, like the ability to switch the persistence store. Is this risky? Sure, there is always some element of risk in portability. Just ask anyone who has written code with a portable UI toolkit, or in HTML for that matter. You don't know what is going to happen until the 1s and 0s hit the silicon.

But …

That's not the job for unit tests. Ideally, you'll have some other tests to verify what happens when the "production" code runs.

Before continuing, I must say that in the last post I neglected to tell you that the brainy Mindscape team and Andrew Peters are responsible for the Background Motion web site, and the code that powers the site. Make sure to visit the site and marvel at the beauty of New Zealand, then drop into the Mindscape blogs. Everyone - let's hear it for New Zealand!

What Is This Risk You Speak Of?

You can write a LINQ query that works fine against in-memory collections, but that can fail spectacularly when you swap in a remote LINQ provider. Here is an obvious example:

var result =
    from a in dc.Addresses
    where !String.IsNullOrEmpty(a.PostalCode)
    select a;

This query is happy to execute using LINQ to Objects, but it fails with an exception if LINQ to SQL is sitting behind the sequence (NotSupportedException: Method 'Boolean IsNullOrEmpty(System.String)' has no supported translation to SQL).

Those types of problems are easy to spot in automated integration testing because exceptions are relatively easy to track down. The real risk is in the queries that don't flame out in spectacular fashion, but execute successfully with slight variations. Here is one example:

var distinctPostalCodes =
       (
        
from a in addresses
        
orderby a.State ascending
         select
new { a.PostalCode, a.State }
       ).Distinct();

This query wants to get distinct list of zip codes and states for all our customers, and order the list by state. Works perfectly with LINQ to objects, and executes successfully in LINQ to SQL. Just one tiny problem you might observe in the generated SQL:

SELECT DISTINCT [t0].[PostalCode], [t0].[State] FROM [dbo].[Address] AS [t0]

Notice the distinct (pardon the pun) lack of an ORDER BY clause. If the upper layers were expecting the results sorted by State then we have problems.

It turns out that LINQ to SQL throws out an inner OrderBy operator when the Distinct operator comes into play. This could be for several reasons, but the most likely reason is DISTINCT and ORDER BY have an uneasy relationship in ANSI SQL (it's not just MS SQL). You can read more about this on Jeff Smith's blog: SELECT DISTINCT and ORDER BY, and there is another good explanation here: Some Common Mis-conceptions about DISTINCT.

One also has to wonder if Distinct might reorder the results in its quest to remove duplicates - it's not explicitly documented that it doesn't. In this case, it's better to forego the query comprehension syntax and make the pipeline of operators more explicit:

var distinctPostalCodes =
      addresses.Select(a =>
new { a.PostalCode, a.State })
               .Distinct()
               .OrderBy(a => a.State);

This forces LINQ to SQL to generate a safe query with the expected results.

Is there risk? Sure – and it's not just in LINQ to SQL. Any multi-target technology runs the same risk. You just need an awareness and safety net (in the form of tests) to mitigate the risk.

posted by scott with 1 Comments