# Smart Metrics Slides

This article summarizes my “Lessons Learned in Project Metrics: Are you Metrics Dumb or Smart?” presentation. It covers the following six topics

1) Measurement Troubles
Metrics are like fire. They make good servants but poor masters. When controlled properly and used appropriately they are very useful. Yet if they are not used correctly or let to get out of control they cause a lot of damage.

Not all the things we can measure or observe are that helpful to us. In fact many observations can, without proper understanding, lead us to the wrong conclusions. Such as:

• The sun rises up in the sky in the morning and then falls down again at night
• Planets revolve around the earth
• Stars come out at night
• Heavier objects fall faster than lighter objects

So we need to be careful that we understand what we are measuring and how the systems work. We also need to be careful to measure and collect the right attributes since there are so many things we could measure, most of which are less valuable than the really smart metrics we should be focusing on.

”There are so many possible measures in a software process that some random selection of metrics will not likely turn up something of value” – Watts Humphrey

Often the things we really want to measure are kind of hard to discern or objectively measure. Such as:

• Spouse’s mood
• Team Commitment

Which leads us to recognize that many of the most useful metrics are subjective, intangible and challenge us to try to measure them.

"Not everything that can be counted counts, and not everything that counts can be counted“ – Albert Einstein.

2) The Hawthorne Effect

The “Hawthorne Effect” is the name given to describe a phenomenon of influencing what you measure. It gets its name from the Hawthorne Plant, a General Electric company factory in Illinois where Elton Mayo and others conducted experiments on work productivity in the 1920’s and 1930’s.

They selected a group of workers and put them in a controlled environment where they could measure their productivity. They made the working environment brighter and re-measured worker productivity and found it had increased. Upon installing more lights and making the work place even brighter still they found worker productivity increased yet again. I am sure General Electric, could see the marketing potential for installing more electric lights in factories. However, Mayo was a good scientist and as a control, reset the lighting to the original levels and measured productivity. Once again it went up, seeming unrelated to lighting levels, the simple act of measuring a group of people against something influences their behaviour.

There has been lots of speculation over the proceeding years as to exactly the cause of these increases. Is it the special treatment of separating out a group, the public way the measurements are taken, or where the data is displayed. However, it is widely accepted that you will influence what you measure. So the take away is, be careful what you measure, because the side effect may have adverse consequences to your project.

3) Design Factory Metrics

One of my favourite project management books has nothing to do with software development or even project management really. It is “Managing the Design Factory” by Don Reinertsen and it is packed full of really valuable project truths from the product design and development world that apply equally well in the design heavy world of software.

Don has many valuable lessons for planning and estimation, but what I want to draw on right now are his guidelines for valuable metrics. First they should be simple; the ideal metrics are self-generating in the sense that they are created without extra effort in the normal course of business. Second, they should be relevant to the end goal of the project. A third characteristic is that they should be focussed on leading indicators that are future focussed.

So, given these characteristics of good project metrics let’s see how many of today’s traditional project metrics measure up.

Desirable Characteristics:
•    The Hawthorne Effect is positive
•    Simple, self generating
•    Relevant to the end-goal

•    Lines of Code Written – poor, does not reward simplification, leads to code bloat

•    Function Points Delivered – poor, effort to generate, not relevant to the end-goal of project

•    Hours Worked – poor, leads to long hours, burn-out, defects, consumed budgets

The list goes on, budget consumed, conformance to plan, etc. Many project metrics companies collect and publish violate these basic goals of “Design Factory” metrics and acknowledgement of the Hawthorne Effect. I am not saying that we do not measure things like budget consumed, obviously we have a responsibility to, but not overtly, not with big graphs on the wall and in-your-face collection. Instead be more aware of the nuances of metrics collection and focus on smarter project metrics.

Smarter Project Metrics
OK, so if the previous metrics are not optimal, what constitutes a better set? Well, ones that recognize that you will influence what you measure and focus on simple factors that relate to project success in the future. Such as:
•    Features Accepted
•    User Satisfaction
•    Defect Cycle Times
Here are some examples:

A cumulative graph of features accepted. The background coloured bands show different functional areas and the blue line progress against these areas. Note we are tracking features accepted, not features developed, or features tested. The end goal of the project is to have the system accepted by the business and so this is what we need to track.

This graph is a Cumulative Flow Diagram that not only shows features accepted, but also work in progress. Queue size is a useful leading metric that can help us determine likely completion times.

Parking Lot Diagrams are a nice way of summarizing progress against a variety of goals in a single page executive summary. Colour coding helps highlight areas behind (red), done (green), in progress (yellow), and not yet started (white).

User satisfaction is subjective, but very important. Perhaps a regular (Good, OK, Bad) check-in can be arranged to check mood and more importantly detect issues before they become problems.

Likewise sponsor confidence is a critical metric to monitor. Here a cumulative scoring measure is being used. Green scores +1, Yellow 0, and Red -1. The red line indicates a tolerance threshold of -2, if we ever reach this level, intervention (perhaps a Steering Committee Meeting is called to resolve the issue.)

A previous company I worked at required a Sponsor Confidence score with every weekly status report. If you could not get inn touch with your sponsor you received a -1 score for lack of PM interaction. It was a great system to ensure frequent interaction and proved a good early warning indicator for issues that were brewing outside of the project, but I may not have heard about.

Defect Cycle Time is useful. We want to reduce the time from defect detection to defect fix. This not only improves the business experience, but reduces the code written on top of faulty code, and ensures issues are fresher in developer minds and faster to fix.

4) Measuring Up
Measuring Up means raising the level of measurement one level higher than you might initially expect. Mary Poppendieck describes the benefits of measuring up really well and here’s the high level version.

Robert Austin in his book “Measuring and Managing Performance in Organizations” makes three key observations:

1.    “You get what you measure”

2.    “You get only what you measure, nothing else”

3.    “You tend to loose the things that you can’t measure: insight, collaboration, creativity”

The first correlates with the Hawthorne Effect. The second builds on to say that if you do not measure things they may not get done. The third point is that we need to be careful that our measures do not promote local optimization and suppress desirable behaviours like collaboration.

The example Mary cites of a company who really understands this is Nucor Steel. Growing quickly since they were listed on the stock market in 1971 into a \$4B leader with a great record for employee retention, collaboration and labour relations, they attribute a lot of their success to their incentive based pay based on productivity.

The interesting thing is that plant managers are not paid based on how well their plan performs, but on how well their plant and other plants perform. This may sound unfair, how can they influence how other plans perform? Well through, collaboration, sharing ideas and cooperation. Likewise departments are measured across multiple departments (to avoid silos), teams across teams, and individuals based on team results. This rewards collaboration and cooperation that would otherwise be difficult to measure and encourage.

In the software world defects could be traced back to individual developers, but they may well be the result of environmental challenges. So rolling them up to an entire team and getting the testers involved earlier to provide more timely and valuable feedback to developers may be a better way to go.

“Instead of making sure that people are measured within their span of control, it is more effective to measure people one level above their span of control. This is the best way to encourage teamwork, collaboration, and global, rather than local optimization” – Mary Poppendieck

We touched on this briefly earlier, “Design Factory” metrics should be leading vs. lagging, focussing on the future so that they can help us change direction and make better project choices. For an accountant, a perfect view of the past might be useful, but for a project manager, a perfect or even an imperfect view of the future is far more useful.

So we should pay less attention to Lagging Metrics that have already past including actual values. Instead we should pay more attention to Leading Metrics such as trends and the likely impacts of their projections.

Here we can see the trend for unresolved Change Requests is increasing moth on month. Perhaps we are not allowing enough time to fully understand user expectations and pushing too quickly to building features.

Here we are trending project risks and can see that the overall trend is down, towards risk reduction and avoidance. Risk Profile graphs like this can be useful to explain project progress in early phases of a project when trialling technology (via development spikes) and risk mitigation is a strong focus and there may not be a lot to demo to the business.

Cycle times are also a great leading metric for identifying bottle necks in a process. In the example depicted above, the UI Designer is the bottleneck in the process handling 30 stories an iteration, a lower number than anyone else on the team. Micro managing people and measuring individual productivity is not effective so how do we gain insights into these bottlenecks?

Cumulative Flow Diagrams can be used to track work in progress and identify bottlenecks without the need for micromanagement. (For a full explanation see here.)

Little’s Law tells us that queue size is proportional to queue length. So by measuring work in progress we can gain insights into completion times. However we need to be careful how we measure and report queue size since we do not want to influence it to make it larger, instead the opposite, aiming to reduce work in progress.