How can we tackle performance, during design, build and run phases of a system to handle issues before they happen?
We all have been confronted with critical systems not performing as they should. This subject is fascinating as it is crossing nearly all the fields of I.T. systems design, build and run.
Defining performance: How should it perform?
Not performing as it should is our first clue. How should it perform? Has it been thoughtfully defined? As well defined as the entry fields colors, as the business rules, as this shiny button on the left?
Analysts often have difficulties and sometimes even avoid defining non-functional requirements. But these are critical requirements, especially performance, as it is often expensive to meet when excessive and expensive not to meet when the user’s experience suffers, in sales/audience/company image for any common website, employees motivation/engagement and process efficiency for any in-house system. Only business drivers can serve as foundations to define NFR, during technical design, it is often too late to acquire them back.
IMHO, agile approaches taught us very interesting techniques to focus on expressing user experience, and the growing field of business service level definition as well. Focusing on the interaction between each user and the system is key to define the experience and the main points of attention. Service level value on these main points completes the goal definition:
- Key interaction scenario
- Value of performance
- Cost of non-performance
If any system specification can provide such inputs, software and technical designers/architects will have a very good starting point to work and communicate on the choices they make.
Using a model
An I.T. system is known to be complex because it is made of many moving parts, each living a predictive but non-linear life.
As engineers, we try to tackle that by modeling the system, grouping parts and subsystems in subsystems (engineers love nesting dolls). The system is now simpler to describe, less moving parts, hierarchical decomposition of understandable parts. But each part is now even less linear and less predictable. But is it?
In my experience, at a certain level, subsystems can be bound to a certain behavior, (not purely linear but following simple defined models), and sometimes, when it is not completely bound, design decision can be made at these few remaining “unstable” points to ensure the correct behavior of the system. And at this level indicators can be expressed in conjunction with the interaction scenario.
The issue here is usually to build this scenario (really implement the user side of it), and have enough experience to target the right level. This calls for an experienced architect with some background in testing. Here again, agile approached taught and gave us a lot on defining scenarios, implementing the user-side of it, setting it up in the continuous integration test cases.
By modeling the various subsystems and their interactions we can estimate the flow of events and place the resources constraints shaping it from linear to non-linear to pure overflow. The model itself can already demonstrate limits by applying performance level goals under the load defined by the composition of the scenarios (occurring concurrently) and common behavior of technical subsystems or contributing subsystems SLA. It can be very useful to pinpoint structural bottlenecks and to concentrate on what to test.
Testing and modeling
With the model comes the testing points, and with the scenario the activation pattern. The key point here is to share the same values between the scenario and the model (inputs), and the same values between the model and the test (targeted outputs)enough to run the tests and keep the system in check during its construction, but also enough to monitor it when live, as we all know life is … sometimes different, but the model and the scenario remains, , only the inputs can fluctuate with the business (the output can also fluctuate when the impact is more closely witnessed in both directions).
One interesting approach with such a modeling effort is to keep checking various points, not only the end-to-end behavior. Change in subsystems or in the underlying components can then be more easily identified when they impact the system. Those “inner points” are also very useful to communicate with other actors.
In software we tend to overestimate the role of the code itself in the global performance of the system (bad code impacts performance, but is usually found quickly when good testing is in place, especially when it is done regularly). For example simply setting up an appliance for security reasons could easily contribute to the main part of the system latency even if it’s “transparent” for performance (latency is usually underestimated as a key contributor for user experience and sometimes for resources overuse).
Modeling a system is multi-layer work, modeling the behavior of all the components forming each subsystem. Testing is also multi-layer work as the software in a bubble is not only what will constitute the running system.
On a project some years ago, we modeled the performance from the inception phase to help a client select the right solution. This model has been used and improved during a two year journey. The project represented and has been very helpful in two key areas:
- Detecting variations in behavior (issues with regressions in product releases, with changes in the database configuration)
- When approaching production, detecting variations in the volumes of data and requests (load levels where estimated based on business indicators, most were very close to actual values, some were changed by systems implementation up in the chain sometimes 2 to 5 times more messages than anticipated)
New perspective, what if performance is built in the software?
In common heavy load systems designs, a technical architect often works “around” the software components to define how much instances are to set up and to control the load on each active component by adjusting threads and queues, and adding some monitoring to detect out of range behavior and anticipate for system overload.
That’s load regulation, and good subsystems have clear and regulated interfaces to reduce the potentially chaotic behavior induced by overloading a subsystem. But the more regulation points, the more latency raises. Performance under load is exchanged for pure performance.
Once the model is in place and indicators are defined that “art” of load regulation is not that complex and is generally expressed in simple formulas.
What if we could build this in the software itself? Those formulas could be implemented as an aspect measuring the live performance and adjusting the load it accepts or slowing down requests before the system chokes dynamically. It’s basically what some load balancers do, when setup with preventive health check, but with a global (or per services lines) approach.
Such solutions begin to emerge and one of the most elegant ones is autoletics (ex-jInspired) with Sentris (and Signals). William Louth, in a 2012 article describes an impressive view on what reflexivity can bring on software performance and QoS.
This could reduce the need to push all the constraints through heavy anonymous components, like proxy and appliances, build aspects with the software itself, and keep only the edge control in appliances. Lower cost means latency. Devops at work?
This is another reason to model early and continuously, but with an opportunity to be more granular and even have an improved resolution on the scale at which measures and actions are located.
More Information:
Claude Bamberger has been an information systems architect since his first job in 1994, realizing in nearly 20 years that it’s a role that one becomes more than one is, mainly by enlarging the technology scope known, the skills mastered, the contexts experienced. Particularly interested in technologies and what they can mean in improving business results, Claude went from consulting in the early days of object-oriented development and distributed computing to projects, team, and I.T. department management during half a decade to come back to consulting in 2008 in Sogeti after an innovative start-up co-founding in the Talent Management field.
0 comments on “Performance is key in I.T. systems”