Excel spreadsheet for metrics: a Kanban experience report

2014-06-19
Chuck
3 mins

This is the third in a series of blog posts about implementing Kanban on my current project. The first installment was about establishing the flow. The second described the Rally data extract scripts I wrote. In this post, I talk about my Excel spreadsheet that consumes the data.

abacus

All of the data exists on a “Stories” tab. I collect the formatted ID, the date the story first entered each state, and the lead developer and quality engineer. When a story is Accepted or Rejected, I run the story query script and input that data.

I also do some data normalization. For example, my story query script does not take weekends and holidays into account, but the spreadsheet does. Neither team currently does story size estimation, but I derive a story’s size by taking the total number of hours the story was being worked and apply each story to the typical story point Fibunacci sequence using a Bell curve. Values like holidays and the story point sequence exist on a Lookups sheet.

With this data, I can calculate a number of interesting metrics. The data from the Stories worksheet is aggregated onto the Days worksheet, and Days onto both the Weeks and Days of Week worksheet. These populate charts like the Continuous Flow Diagram. I took the Accepted stories data from the CFD and some Takt time calculations to project future delivery.

This calculation became invaluable during week three and four of the first milestone release. I was able to tell the team that they need to get 4 stories accepted a day in order to reach the milestone goal. We ran into a blocking issue, and we got up to 5-6 stories per day. The team responded by swarming on the roadblock and collaborating to shepherd the logjam through the system and got us back on track.

While many individuals worked many late nights, we only had one mandatory evening of work on the last night of the project. Fueled by pizza, we solved our last integration challenge and were done by 8pm. To me, this made all the hours of work developing the spreadsheet and time putting in data worthwhile. I think if the team hadn’t rallied when they did, the last week before delivery would have been very painful indeed.

I’m also breaking down story data by dev lead, QA lead, tag and location. While I’m aware of the potential uses of individual performance data, so far there are no strong patterns to observe. I get dev lead and QA lead from the senior person who’s making changes in Rally. My heuristic is a workaround for the limitation that a story in Rally can only have a single owner. We change the owner of the story during its lifecycle on the board to know who to talk to about the story at any given time.

Originally, the Tag field was going to be used to slice data by Rally tag, but I ended up using it to track work by component. I parse story titles to determine what component(s) they affected. Again, not ideal, but good enough for now.

I also look at other daily metrics. I monitor the number of story changes made per day, for example, to see how active the team is.

day-of-week-chart

As you can see, stories are most active on Tuesdays. Early in the project, there was a spike on Thursdays as well, but over time, that’s dampened.

I look in changes in key metrics over time. For example, I see wide fluctuations in the amount of development time, but much steadier validation times over the life of the project.

Next time, I’ll talk about Kanban flow process violations.

Rally data extract scripts: a Kanban experience report

2014-06-12
Chuck
7 mins

This is the second in a series of blog posts about implementing Kanban on my current project. The first installment was about establishing the flow. This post talks about obtaining the data for the Kanban metrics. Later posts talk about the Excel spreadsheet that consumes the data.

script-bash

My employer has standardized on Rally, so we are using it for work item management. However, I’ve found that its Kanban flow support is not as robust as I need. I wanted to create several additional charts on some derived values, so I manually input some data for each story into Excel for my formulas and charts. Of course Rally supports custom applications in JavaScript, but I didn’t relish the thought of performing statistical analysis in raw JavaScript. I chose to continue to use Excel for my metrics gathering.

After creating the first few, I started exploring the data and made more. At first, I would scour the revision history for the data I needed, but I quickly desired some automation. I wrote two shell scripts that query Rally’s APIs for data. It currently takes me about 15 minutes a day of data entry to keep pace with the teams, which is short enough that I haven’t taken the next step of converting a derivative of that script into a Excel data source.

Here’s the output of my script, with some of the project-specific data scrubbed. I used Fake Name Generator to come up with poor Richard.

Fake Name Generator generates a lot more than names. Except when I’m looking for character ideas for a story I’m writing, I only use it for realistic test data.

$ ./storyQuery.sh 53142
US12345 -- Reorganize test packages to support ease of testing
Project: Backend
Release: BETA
Tags:

State changes
18641196378  2014-05-29  None         -------  Cedric Jorgenson
Beta         2014-05-29  None         -------  Cedric Jorgenson
Beta         2014-05-29  Ready        -------  Cedric Jorgenson
Beta         2014-05-29  Design       -------  Gary Bennett
Beta         2014-05-29  Development  -------  Peggy Bivens
Beta         2014-05-30  Validation   -------  Richard Chenier
Beta         2014-05-30  Validation   BLOCKED  Chuck Durfee
Beta         2014-05-30  Validation   -------  Richard Chenier
Beta         2014-05-30  Accepted     -------  Cedric Jorgenson

User Count: 2
Defects: 0 (NONE)
Blocked: 1 hours
Design: 0 hours
Development: 25 hours
Ready: 0 hours
Validation: 0 hours

As you can see, the script uses both the story details and Lookback APIs to get a concise history of the story. The story details API only provides rollup information; I need the Lookback API to get revision history.

In this case, Cedric is the product owner, Gary is a quality engineer, and Peggy and Richard are developers on the project. Richard didn’t implement the story, so he’s handling validation. On stories that are risky or important to the project, a quality engineer will also perform ad-hoc testing.

My script doesn’t do separate lookups to obtain names for person or release IDs, as you can see by the 18641205378 in the first line. For some enumerated value fields, you can request that Rally “hydrate” a field, but the updating user and release are not among them. Though written in bash, my script uses a Perl associative array to inject the names into the output. While there’s a more complete call example later, that Perl call looks like this:

 | perl -pe '%users = (
 "10624400656","Cedric Jorgenson",
 "1191677143" ,"Chuck Durfee",
 "12294673246","Gary Bennett",
 "13318093404","Peggy Bivens",
 "13304263924","Richard Chenier",
 );
 foreach $key (keys %users) { s/$key/$users{$key}/g; }
 ' \

The -e option lets you execute Perl scripts inline. The -p option runs the script on each line in turn.

I use a separate query script to get a new person’s name and update the script, which I run a handful of times a month. I get the release by hand, since setting up releases happens only a few times a year.

Here’s how the story query script makes one of the Lookback API REST call gets some of its data. I defined $KANBAN_FIELD and $MY_WORKSAPCE earlier in the script. I parameterized $KANBAN_FIELD because the front-end team has a different workflow than the backend team and hence a different custom field in Rally to store the state.

 BODY=$(cat << EOF
 {
 "find" : { "FormattedID": "$STORY" },
 "fields" : ["ObjectID", "_ValidFrom", "_ValidTo", "Release", "Blocked", "$KANBAN_FIELD", "_User"],
 "compress" : true
 }
 EOF
 )

RESULTS=$(echo $BODY \
 | http -a $AUTH -j POST https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/
    $MY_WORKSPACE/artifact/snapshot/query.json \
 | json Results)

I’m using two CLI tools to help me, httpie and json. HTTPie is a Python script that simplifies cURL-style REST calls. The -b option only outputs the response body. The -a provides my Rally credentials. I don’t store those in plaintext in the script, of course.

If you’re interested in seeing all the fields during script development, send "fields": true in the POST body.

The json tool is used to extract data from the REST call results. Here, I’m filtering the output to just the inner Results object. You can see json’s capabilities more clearly in the story details REST call:

 DETAILS=$(http -b -a $AUTH -j GET "$DETAILS_URL?query=(FormattedID =
   $STORY)&fetch=true&workspace=$WORKSPACE_URL")
 DETAILS_RESULTS=$(echo $DETAILS | json -D / QueryResult/Results)

if [[ $RAW = true ]]; then
 printf "Raw Details\r\n"
 echo $DETAILS_RESULTS | json
 fi

STORY_NAME=$(echo $DETAILS_RESULTS | json -a _refObjectName)
 DEFECT_STATUS=$(echo $DETAILS_RESULTS | json -a DefectStatus)
 DEFECT_COUNT=$(echo $DETAILS_RESULTS | json -D / -a Defects/Count)
 PROJECT_NAME=$(echo $DETAILS_RESULTS | json -D / -a Project/_refObjectName)
 RELEASE_NAME=$(echo $DETAILS_RESULTS | json -D / -a Release/_refObjectName)
 TAGS=$(echo $DETAILS_RESULTS | json -D / -a Tags/_tagsNameArray | json -a Name
   | tr '\n' ',' | sed -e "s/,$//;s/,/, /;")

The -D option on json sets the delimeter for the -a command, which causes json to parse each record of an array separately. To handle an array of arrays, you need to call json twice, as is done in for $TAGS. I use tr to translate the return character into a comma, and then sed to do some inline substitutions. I also make use of GNU awk or gawk to do some aggregation, as you can see from this excerpt:

 # Display blocked hours
 echo $RESULTS \
 | json -d, -a _ValidFrom _ValidTo Blocked \
 | grep true \
 | sed -e "s/9999-01-01T00:00:00.000Z/${NOW}/g;
 s/[TZ:-]/ /g;" \
 | gawk -F, '{d=(mktime($2)-mktime($1))
 printf ("%02d h\r\n",d/3600);}' \
 | gawk '{cnt+=$1}
 END{printf "Blocked: %s hours\r\n",cnt?cnt:0}'

Here, I use json to take the full JSON output of the REST call and strip out everything but the fields I specify. I then look for periods when the story is blocked (where the Blocked flag is true). I use sed to turn Rally’s “max datetime” field into $NOW, which I obtain earlier. The first call to gawk takes the period of time and converts it into hours, whereas the second sums those times and reports the grand total. If the story was never blocked, it shows 0.

Armed with that summary, I then plug the data into my Excel spreadsheet, which is the topic of my next post. Then I’ll talk about process violations.

Establishing the Flow: a Kanban experience report

2014-06-05
Chuck
4 mins

simple-kanban-board

To my mind, Kanban the process is a good fit for managing software projects in a startup situation, like we have at my current employer. In his book The Lean Startup, Eric Ries talks about the importance of build-measure-learn feedback loops. Kanban supports the measure phase of this feedback loop through rich metrics.

I’d like to preface this blog series by saying that it’s only been eight weeks since Kanban was applied in earnest on this project, and while I’ve done this before, examples should not be taken as a definitive statement of best practice. In fact, there are a number of places where potential improvements are obvious. Note also that this project has a very aggressive delivery timeline and that date has limited the number and scope of improvements the team is willing to undertake.

On this project, we have about 10 people here in Colorado and another 20 contractors in eastern Europe. If we were using scrum, I’d be a scrum master to all those teams, as well as a “release train engineer” in SAFe parlance. I’m honestly not sure what the analogous role is called with Kanban teams, since many Kanban teams are self-managed. Because project manager has a specific meaning at my employer, the best title I’ve come up with is “project coordinator” or maybe “technical project manager” (although that title has baggage too).

When I was tasked with implementing Kanban for this project, I started by talking to the team about workflows they have enjoyed using in the past. I used the getKanban game to illustrate how a well-considered Kanban flow operates, and the team still talks about the game. We ended up adopting a similar flow to getKanban’s for our backend team.

In getKanban, the states are: Ready, Design Doing, Design Done, Development Doing, Development Done, Test, and Deployed.
For our backend team board, we chose Ready, Design, Development, Validation and Accepted.
The front-end team chose different states: Ready, Requirements, Wireframes, Data Contracts (where we identify API changes and groom stories for the backend team), Proof of Concept, Production Ready, Validation, and then Accepted.

There was a lot of confusion about the Proof of Concept and Production Ready columns, which I renamed after the first delivery milestone to Development and Deployment. I’m also finding that stories on the front-end board don’t spend significant time in Wireframes or Data Contracts, so I’m considering consolidating Requirements, Wireframes and Data Contracts into a “Design” step.

I spent time with both teams establishing some exit criteria for each step in the workflow. On the backend board, stories exit Ready when they are groomed, including acceptance criteria and test scenarios. During Design, the developer, often in partnership with a quality engineer, comes up with test suites and a high-level design approach, at least to the component and API level. Then, during Development, they create the implementation as well as any JUnit and FitNesse fixtures needed to exercise their code. After a code review, the story enters Validation where the functionality is exercised by FitNesse as well as ad-hoc testing. Then, the PO accepts the story. Should we decide at some point that the story is no longer desired, it gets moved to Rejected.

In practice, team members often break these exit criteria, and we went through a couple of weeks of blocking stories that didn’t pass muster. I found this was an effective way to get the team to pay attention to exit criteria. With a team this large, it’s hard to get everyone together at once. I discovered that I underestimated the communications effort – both in terms of a Russian/English language barrier as well as need for repetition.

We use git for source control, specifically the Atlassian server project Stash, which allows us to keep our code behind our firewall and to use LDAP for user access. We use pull requests for code reviews, because Stash offers similar reviewing capabilities to github.

The eastern European team is used to Subversion and new to git, and that’s caused some confusion with branching strategies. When one team reported they were spending man-days handling merge conflicts, I suspect it is either because of the way they handle branching or how they have configured git. In my experience, after some explanation, merge issues of this magnitude arise much less frequently with git. The tiny size of our initial code base is also a contributing factor – there is a lot of contention for certain key files. I’ve challenged the team to dig into why that file and the code it contains are involved in so many of our features to make sure they are avoiding creating a God object or the like.

The next post digs into the data collection aspect of a Kanban flow. Later posts talk about my Excel spreadsheet that consumes that data, what metrics I pull, and process violations.

Bowling Game kata and frames

2014-04-15
Chuck
3 mins

I ran into an interesting scenario after revisiting the Bowling Game kata from memory. The exercise was a small reminder of the power of test-first development.

I worked through the kata as usual, but I was unable to recall each step, so I wrote the code fresh. I came to the following implementation, which passes the normal JUnit tests:

  
public int score() {
 int score = 0;
 for(int i = 0; i < 20; i++) {
   if (scores[i] == 10) {
     score += scores[i+1] + scores[i+2];
   }
   else if (scores[i] + scores[i+1] == 10) {
     score += scores[i+2];
   }
   score += scores[i];
 }
 return score;
}

I realized that I’d forgotten to support the notion of frames. An example is the logic for 10 pins. A strike only occurs when the 10 pins are knocked down in the first try in the frame.

Unit tests are only as good as the use cases they cover. Because I hadn’t written a test that implemented the “strike” business rule, I had a faulty implementation.

Fortunately, it’s easy to resolve this situation with test-first development. Here’s the failing test I used to exploit this design weakness:

 @Test
 public void spareWithTenPins() {
   g.roll(0);
   g.roll(10); // spare, not strike
   g.roll(2);
   g.roll(1);
   assertEquals(15, g.score());
 }

Next, I fixed the implementation to take attempts in pairs – that is, a “frame” – and the spareWithTenPins and other tests passed. Here’s the new score() implementation:

 public int score() {
   int score = 0;
   for(int f = 0; f < 10; f++) {
     int i = f * 2;
     if (scores[i] == 10) {
       score += scores[i+1] + scores[i+2];
     }
     else if (scores[i] + scores[i+1] == 10) {
       score += scores[i+2];
     }
     score += scores[i] + scores[i+1];
   }
 return score;
 }

It’s worth noting that this is not clean code – I’m not using variables with intention-revealing names, for example. I found that I was somewhat lax about the refactoring step when performing this kata today.

Lean Wastes and Software Delivery: Overproduction and Overprocessing

2014-02-19
Chuck
9 mins

It’s an all too common story in software delivery shops. How many times have you heard about a delivery team who writes a feature for an application, but there is a delay in release? This is overproduction, producing more than the next step needs. While it may not seem like this is a problem, it can cause issues.

For example, let’s say that the team upgrades a code library they use to a new version, because it lets them get rid of some data validation they wrote themselves that is now handled by the library. It’s Halloween, and the product is released quarterly. In a workplace version of trick or treat, a critical defect is reported against the software version that’s deployed in production.

Not only does the delivery team have to reproduce the error in the deployed version, they need to also reproduce the error in the new version. They need to know if the new version exhibits the issue. If it does, they can discuss the possibility of just deploying the new version. If not, they know they will also have to address it in the new version before they can release it. Either way, it is more work to diagnose the defect than it would have been with a single version of the codebase.

To minimize the chances of this happening, the firm should work toward continuous delivery. Continuous delivery is a powerful technique that requires a number of disciplines to be in place. The most common stepping stone is continuous integration, which refers to building an application from source control automatically, usually any time a code file changes. Of course, this presupposes the team uses a source control system! And for those builds to be useful, some assurance of quality is needed, which comes from automated tests. Fortunately, firms can reap benefits from each one of these techniques in isolation, so it’s not a large upfront investment, but rather a number of small investments over time.

Some customers are sensitive to changes. For example, customers may train their employees on new product features every quarter. It would be nightmarish for them to continuously retrain their staff! For those situations, delivery teams need to practice accretion of features, where additive changes are strongly preferred and breaking changes are avoided at all costs. Many firms that excel at continuous delivery insist that the features they develop can be activated or deactivated with a configuration setting. That way, the pieces that make the feature work can be deployed without any customer knowledge until the feature is turned on. Or they may pursue a plug-in architecture, where features can be deployed as separate code modules when they are completed.

Let’s take another example, reports. It is not uncommon for a particular report to be commissioned for a purpose, and after the purpose has been fulfilled, it goes unused but still produced. The typical dynamic I see is that Product actively pursues addition of features, but Development heads efforts to remove features.

Duplicate data is another common area for overproduction. I recently saw one firm where Team Bravo got Team Alpha to build a service that allowed it to extract data from the Alpha application’s database. On the surface, the use case made sense. In reality, though, Bravo was calling the service to extract gigabytes of data into the Bravo database, along with gigabytes of separate data from Team Charlie in order to make a unified report. It turned out that over 90% of the load on Alpha’s database was from Team Bravo, and that there were simple filtering changes to the extract service that would have greatly reduced the duplicate data. Many data analysts are probably shaking their heads as they read this.

An approach to addressing this is whole team thinking, not just at a feature team level, but extending to portfolio management as well. There are a number of frameworks that try to address scaling agile. I’ve found that the scaled agile framework (SAFe) is a good conceptual starting point, though I would advise teams to investigate other options before choosing an approach.

Overprocessing

Overprocessing is processing that doesn’t add value. In the world of software, this is known as “goldplating”: creating features or implementations that go beyond the requirements. I see this most often in places that don’t practice whole team when it comes to requirements. Whole team is a concept where the entire team is responsible for the software development lifecycle. Everyone gets input in all parts of the process. When Product, Development and Operations are divided, sometimes the requirements don’t well reflect all the work that’s needed to produce a product. This opens the door for overprocessing.

One example I see is a misunderstanding of the value provided by each discipline practices, such as Development’s testing infrastructure. In places where Product has sole ownership of requirements, I’ve seen teams fighting for testing facilitation features, such as whether to include interfaces for making sure a service has deployed correctly. Having left out these features, I then see Product complain because it’s difficult to troubleshoot the application in production. Operations can be a powerful ally in these discussions, but they aren’t often at the negotiating table. Operations can also help ensure that proper security or monitoring considerations are introduced early.

Sometimes, developers foresee that a simple implementation of a feature won’t handle anticipated future use cases. It can be hard to resist the temptation to code a more robust solution right away, even though that future use case may not materialize for months or years, if at all. This often happens during the phase in a developer’s career when they are learning design patterns and they try to apply them in places where they aren’t appropriate. Then, they learn about enterprise architecture patterns and the cycle repeats. Or they learn about a new technology. This impulse toward fresh thinking is perfectly normal and should be encouraged, but at the same time, the needs of the product need to outweigh incorporating new technology simply for newness’ sake.

I’ve seen Operations guilty of this too, for example in the realm of virtualization. When this technique of replacing physical machines with software images running in a machine emulator is new, there can be a drive to lower opertional costs by virtualizing everything. There have been a couple of times now where I’ve seen Operations ignore Development and virtualize a performance-critical piece of hardware, only to undo it later. I’m thinking of a couple of database and version control servers here, both of which needed a lot of memory and did a lot of disk and network I/O.

In another case, I saw Product insist on tight security for an application. Development did not communicate the performance costs and responded with a service oriented architecture solution that was unacceptably slow from doing a two-way security certificate authentication on every service call — for a real world analogy, imagine your house and having to unlock a door every time you wanted to walk into another room — but it was also hard to onboard clients. The burden was akin to having to install a security certificate on every laptop that wanted to do a Google search, and Operations balked at the idea of having to manage all those certificates on machines they didn’t control. While that level of security might be appropriate for some applications, it was deemed far more robust than the application in question needed.

When Development and Operations are not working together, sometimes problems are handled in an overly complicated manner. In one case, Operations had written front-end scripts and even patched Development programs in production without letting Development know. Predictably, this led to odd defects in production that couldn’t be reproduced in test environments, as well as increasing reluctance to put new versions into production — because it would mean development work by Operations to update their adaptations of the system! When Operations and Product work together, the need to cobble solutions like this together is greatly reduced and those rare instances tend to last only until a new version can be deployed.

I have also seen goldplating occur when there has been turnover in Product, and the development staff have a lot of experience with the product. Senior developers sometimes develop the feeling that they know the customers better than the product team does, and they know they would like some feature they dreamed up. Here, Product disciplines like the business model canvas, value mapping and other techniques should be applied, to evaluate those ideas from Development and Operations against the market and target customer base.

Another example is when Product and teams spend lots of grooming time fleshing out stories, only to decide that the work item won’t be played — that the outcomes requested won’t be produced. Paragraphs and charts may be created when a sentence would do. And although the discussions may have lasting impact on the direction of the product, much of that brainpower talking about potential future issues like implementation details goes to waste. It’s important to groom epics, features, stories and tasks to the appropriate level of detail for their state in the delivery lifecycle.

In summary, having a good process for incorporating Product, Development and Operations into the feature discussion can alleviate overprocessing. A healthy discussion between these three groups implies a level of mutual trust, because distrustful parties will tend to discount each other’s input. A whole team approach can help build that trust faster.

An early step in that direction is use of feature teams. Companies that try to preserve silos and handoff work between departments impede frequent feedback, which is a tenet of all agile processes. If this concept is new to your work environment, experiment with a single colocated team with the firm’s goal being for the three group to learn how to work together.

I hope you enjoyed this series!

Prev Next