Cats on Aoshima Island
Cats on Aoshima Island can actually help explain the "n + 1 problem.”

“In measurement,” wrote CMSWire’s Gerry McGovern last February, “you must constantly ask this question: Am I measuring what is important?”

If at some point the answer came back “No,” would you stop constantly asking? And what measures would you be willing — or able — to take next?

The One-to-Many Relationship

Six years ago, applications performance monitoring system maker Dynatrace revealed the problem its automated APM agents encounter most often when measuring aspects of client/server and Web application performance. It’s called the “n + 1 problem” by database developers — if you use exactly that phrase, they’ll know precisely what you’re talking about. (Say no more.)

Dismissively, the problem can be summarized like this: Suppose in a database, a one-to-many relationship exists between an object and a class of something that object might possess. I might own, for example, n number of cats.

The number of fetches required to reveal the health status reports for all my cats, however many that may be, would be n + 1. That extra 1 is the fetch for the record that effectively tells the system, nope, you already fetched the last cat.

In a typical APM environment, an administrator should be able to see a direct correlation between the number of associations and the number of database fetches required to retrieve those associations, and reach the conclusion, “Whoops, we suffer from the n + 1 problem.”

At that point, what does the administrator do: march from the network silo over to the development silo, and paste a complaint notice to the front door?

Before you think the point of this article will eventually be to blame network admins for not doing enough to improve customer experience, think again. For the longest time, there wasn’t really much one could do about the n + 1 problem after discovering one was afflicted with it.

SQL Queries, Middleware & More

From a purely academic perspective, for over a quarter-century, the solution to the n + 1 problem has appeared not only simple but instantaneous: The system of SQL queries could be rewritten to utilize what’s called an inner join. This way, the entire contents of the one-to-many relationship should be retrievable not with n + 1 queries, but always and only one query.

Nothing (not even academia itself) is purely academic. If the problem could have been resolved that simply, it already would have been — as early as 1990.

Since the client/server era of software, communications between a client application and a data warehouse have been made possible through middleware. Client applications (especially the Windows variety) were geared to run on PCs, and as a result, could not manage memory the way a sophisticated server could.

A client app needed to fill data entries from a record into a form one-at-a-time. So the drivers which linked the app to the database (ODBC for Windows, JDBC for Java, any number of options for JavaScript) fetched individual records sequentially. Even if you did phrase the query as SQL, which theoretically described one-to-many relationships declaratively, it was impossible for an app to pass a sophisticated declarative query, through a driver, to a database, and have the results be meaningful to the app.

So the n + 1 problem was cemented into the infrastructure of distributed applications. You could measure it all you wanted, but there wasn’t much anyone could do about it. . . unless they wanted to completely rearchitect the entire database scheme.

Rearchitecting Systems

Well, that’s what eventually happened, albeit just in the last few years. Hadoop brought forth a means for passing very sophisticated queries, for streaming data that had not even been “normalized” yet, using API calls. At last, the means for communicating joined data to very lightweight clients, presented itself.

Yet in the first iterations (and even the second and third) of customer-facing web apps — perhaps for backward compatibility’s sake, perhaps because the original apps’ programmers are long-retired — the n + 1 problem has been baked into the new code all over again. Code that could be fixed, isn’t being fixed, because it’s hard.

Once again, some of us are deploying APM systems to measure our new apps’ performance, only to watch them confirm once again what we knew to begin with: Our systems architectures are not designed with users in mind.

Focus on UX

“Are you, in fact, architecting your application in such a way that you can accommodate for being elastic — not with regard to the requirement of resources, but with regard to the user experience?” asks Aaron Rudger, senior director of Product Management at Dynatrace.

“It does take a focus on user experience today, and understanding, what is your baseline that you’re trying to develop to? And doing that in a way that’s informed by the reality of how you should be defining user experience. It’s not like a page load, which is very 2000s — not even a load event, which is the JavaScript approach.”

While a user may perceive pixels being delivered to a screen, said Rudger, as the principal indicator of whether an app is doing its job, it is no longer (and honestly, may never have been) the actual function of pixel rendering that is responsible for delivering good UX. Rather, it is all the underlying, dependent job functions which must take place before the job of pixel delivery can even begin.

“We as an industry still struggle with defining that,” he continued, “and helping our customers to create standards by which it’s defined. We talk internally about the whole notion that the quote/unquote ‘page load metric’ is dead. A lot of our customers still use it, because it’s relevant to the way they’re doing certain things with regard to their digital assets. But that is not state-of-the-art for sure. And frankly, where the art is has outpaced, in many respects, where the practice is, with regard to defining some of these metrics.”

For years, admins and network operators have been measuring the right things, but living with the results they couldn’t change. Now the network is changing out from under them, being replaced with the cloud — and now many of the right things have become the wrong things.

Either we learn to adapt, or we continue doing what we had been doing since the 1990s: watching our customers twiddle their thumbs, and pondering how we can help them do it faster.

For More Information: