The Excess Strategic WIP problem

Try pouring a bottle of milk into a glass with milk already in it. You have a choice: stop or tidy up the mess afterwards. That is my work-in-progress (WIP) analogy, if you try and do too much – no mater how much you want to do it – you will end up with a mess.

Agile folk are well versed in the problems created by too much WIP and how to deal with it – check out the Stockless Production video if you want to see. In the last six months I’ve been seeing a particular variant on this problem with I’ve come to label the Excess Strategic WIP problem.

In the latest report the manager told me how the team completed a great quarter with 3 priorities – set via OKRs. The senior management team were so impressed they asked for 19 priorities in the next period.

Right now I don’t have a “paint by numbers” solution, fixing this problem is more involved. I’m starting to understand it and I’ll make some suggestions later. Ultimately this a failure of leadership to say “No”. That failure is itself rooted in a failure of leadership to appreciate what is happening on the ground and what doing the worker are up against: the number of strategic initiatives or the amount of business as usual.

In a team it is easy to spot excess WIP using a visual system like a whiteboard or card system. When you track work like this you see an awful lot of work sitting in the “in progress” column. Typically there is more work than people on the team and the work isn’t moving. Some of the work may be actively marked as blocked but more likely most of it is nominally being worked on.

It can’t all be worked on at the same time because despite having two eyes, two hands, and two sides of a brain human’s can only really do one thing at a time – even new parents. I label this work “WHIP” – work hopefully in progress. While it is, in theory, being worked on, most of it is just sitting there waiting for a multi-tasking (i.e. time slicing) person to come back to it.

The good news is when you can see the WHIP you are half way to solving the problem: there are well known solutions. Accept less work, impose work in progress limits, sequence work by adding queues to the board, educate people to work on one thing until it is done, etc. Once you can see the work you have a feedback mechanism, you can take action and, thanks to the feedback mechanism, watch it reduce.

But Strategic WIP is more difficult. Strategic WIP is the stuff the organization decides in really important, the stuff the most senior leadership decides should be happening. The Excess Strategic WIP Problem occurs when those strategic priorities are greater than the organisation’s ability to deliver on them.

While some of this work may be transactional (“Build a Mega Widget”) much of it is transformational (“Adopt Agile working”, “Increase diversity”, “Tighten security”). Such things involve changing the way other work is done: changing the processes, changing the criteria, increasing awareness. The feedback cycle is long but to get started people need to devote time: attend training, arrange kick-off meetings, discuss approaches with consultants/coaches, etc.

In one case I saw this year the organization has, for years, been asked to do more than it is resourced to do. While a few months of such a mismatch might have been manageable the cumulative effect has been to create organizational debt and demoralised staff.

The second case I saw isn’t a result of under resourcing, if anything that organization has too much – money, people and perhaps equipment then they know what to do with. But because their management model resembles a sponge it is impossible to know when the organization has taken on too much. While some pockets were overworked I’m sure other pockets were idle.

In both these cases “lean” was a dirty word. Both organizations had been subject to lean programmes that had stripped out “waste” but, from what I could see, removing that waste had created both organizational debt and demoralised staff. Having removed “spare” staff those left were juggling. Staff didn’t have time for “agile” and their diaries had no space for daily essentials let alone another change programme.

The third case came to light when discussing OKRs. The company had a successful quarter were they focused on a few objectives that success seems to have bred the next failure when the organization requested many objectives.

Actually, this case brought it home: taking on too many, or being given too many, OKRs. I’ve heard of teams tackling too many OKRs so many times in the last year. (In fact, I should name this “Excess OKRs Problem”. This occurs whenever you have more OKRs than you can count on one hand. Don’t tell me your team is big enough to do so, check out Focus is not divisible so limit you OKRs.)

In a way, the Excess OKRs Problem is better than the Excess Strategic WIP Problem because you can see it. One can say “OK company, you have asked us to deliver 15 OKRs here, we need to sit down and talk about this.” Like visualising your work in progress excess OKRs is a thing you can identify and address, it is the reason to talk.

(Note: unlike User Stories which should always be small, OKRs should never be small. OKRs should be big, meaningful and preferably strategic. No team can take on 16 meaningful OKRs, even 5 is too many.)

One of the problems with excess Strategic WIP is that it can be difficult to see, there are different teams involved and in different places. Conflicting priorities are hidden. People on the ground may see problems but the senior people – the people creating the WIP – are too far removed. Those people may not want to hear people saying “You are asking too much”. They may have too much riding on getting multiple work streams done. They may be deaf to the cry of pain when people say “too much.” They may have too big an ego to accept that what they are asking for is a problem. And it may be politically unacceptable.

“Political” is an important word here: several of the cases I’ve heard of, and some of those example above, are Government agencies.

Because excess strategic WIP is difficult to see it is difficult to build a feedback loop and difficult to take action. How do you know when the problem is too much WHIP and when people are “crying wolf” ? – which in itself implies a problem of trust and maybe a belief that “everyone is lazy, we need to push harder.”

By its nature “strategy” is big, which means that the feedback cycles are long and the problems of excess strategic WIP take time to play out. What is a WIP problem looks like another failed strategy.

While I would like to think OKRs can help with this situation – because they force teams and organizations to take stock of what they are working on – they may be making things worse because OKRs include an ambition agenda. Teams are encouraged to “shoot for the moon” and build “10x solutions”. There is good logic here: if one aims for a “10x solution” (i.e. a solution 10 times better than the status quo) and falls short the “failure” may still be “better” (e.g. “5x”) than if one had aimed for a “2x solution” and succeeded.

Ambition with OKRs should not be about doing more OKRs, rather ambition is within the OKRs, a challenge that makes you approach it differently. One can draw a line here between having one OKR which aims for “10x” and having 10 OKRs but I suspect the subtlety will be lost on someone asking for “more than you think you can do.”

So whats is the solution?

I’m not sure there is a silver bullet but I would want to … make the problem visible, perhaps a portfolio level kanban board. I would want to build a feedback loop so I could measure change. I would want to show the strategic people the visualisation.

Management education has a part to play too. I can believe many senior managers would benefit from understand what WIP is, the problems excess WIP can cause, the way excess WIP plays out on a day-to-day basis and how it effects people’s working lives. And perhaps most of all, address trust and the belief that “we just need to push harder.”

That might also mean some of the Lean waste lessons and OKR ambition lessons need to be revisited.


Subscribe to my blog newsletter and download Continuous Digital for free

The post The Excess Strategic WIP problem appeared first on Allan Kelly.

A new career in software development: advice for non-youngsters

Lately I have been encountering non-young people looking to switch careers, into software development. My suggestions have centered around the ageism culture and how they can take advantage of fashions in software ecosystems to improve their job prospects.

I start by telling them the good news: the demand for software developers outstrips supply, followed by the bad news that software development culture is ageist.

One consequence of the preponderance of the young is that people are heavily influenced by fads and fashions, which come and go over less than a decade.

The perception of technology progresses through the stages of fashionable, established and legacy (management-speak for unfashionable).

Non-youngsters can leverage the influence of fashion’s impact on job applicants by focusing on what is unfashionable, the more unfashionable the less likely that youngsters will apply, e.g., maintaining Cobol and Fortran code (both seriously unfashionable).

The benefits of applying to work with unfashionable technology include more than a smaller job applicant pool:

  • new technology (fashion is about the new) often experiences a period of rapid change, and keeping up with change requires time and effort. Does somebody with a family, or outside interests, really want to spend time keeping up with constant change at work? I suspect not,
  • systems depending on unfashionable technology have been around long enough to prove their worth, the sunk cost has been paid, and they will continue to be used until something a lot more cost-effective turns up, i.e., there is more job security compared to systems based on fashionable technology that has yet to prove their worth.

There is lots of unfashionable software technology out there. Software can be considered unfashionable simply because of the language in which it is written; some of the more well known of such languages include: Fortran, Cobol, Pascal, and Basic (in a multitude of forms), with less well known languages including, MUMPS, and almost any mainframe related language.

Unless you want to be competing for a job with hordes of keen/cheaper youngsters, don’t touch Rust, Go, or anything being touted as the latest language.

Databases also have a fashion status. The unfashionable include: dBase, Clarion, and a whole host of 4GL systems.

Be careful with any database that is NoSQL related, it may be fashionable or an established product being marketed using the latest buzzwords.

Testing and QA have always been very unsexy areas to work in. These areas provide the opportunity for the mature applicants to shine by highlighting their stability and reliability; what company would want to entrust some young kid with deciding whether the software is ready to be released to paying customers?

More suggestions for non-young people looking to get into software development welcome.

Return of the sprint goal? (Infographic)

Most of the sprint goals I’ve ever seen are rubbish. Pretty much “do the random collection of stuff we’ve decided already.” Such goals are meaningless – save time by skipping them.

If a team adopts OKRs then I really hope they move towards more goal based working, in which case the sprint goal starts to have meaning again – although maybe the OKR replaces the sprint goal?

Either way, I see an opportunity to move away from backlog driven development (BLDD) and towards a more purposeful style of working. So it was interesting a few weeks ago when Gareth Davies from Parabol (online meeting exercises and such) sent me over this infographic and a link to a blog on Sprint goals. Food for thought.


Subscribe to my blog newsletter and download Continuous Digital for free

The post Return of the sprint goal? (Infographic) appeared first on Allan Kelly.

How I write books (my new book)

Regular readers will know I write books – quite a few by now, it gets embarrassing.

Being an author is a great conversation starter, when people hear you’ve written a book or two they want to know more – everyone seems to have a dream of writing their own book. It also means that people seek me out to ask my advice about a book they are writing, or thinking about writing.

So, I’ve started to write down all the advice I give to people in a new book – How I write books.

I’m following my usual pattern so you can buy early versions on LeanPub now – I released it last week and it immediately sold a few copies. As usual at this stage everything is in a state of flux; spelling, punctuation, grammar and all that jazz will be fixed later. Of course, anyone buying now will get free updates as they become available all the way to the final version.

If you do buy, then please let me know what you think.


Subscribe to my blog newsletter and download Continuous Digital for free

The post How I write books (my new book) appeared first on Allan Kelly.

Evaluating estimation performance

What is the best way to evaluate the accuracy of an estimation technique, given that the actual values are known?

Estimates are often given as point values, and accuracy scoring functions (for a sequence of estimates) have the form S=1/n sum{i=1}{n}{S(E_i, A_i)}, where n is the number of estimated values, E_i the estimates, and A_i the actual values; smaller S is better.

Commonly used scoring functions include:

  • S(E, A)=(E-A)^2, known as squared error (SE)
  • S(E, A)=delim{|}{E-A}{|}, known as absolute error (AE)
  • S(E, A)=delim{|}{E-A}{|}/A, known as absolute percentage error (APE)
  • S(E, A)=delim{|}{E-A}{|}/E, known as relative error (RE)

APE and RE are special cases of: S(E, A)=delim{|}{1-(A/E)^{beta}}{|}, with beta=-1 and beta=1 respectively.

Let’s compare three techniques for estimating the time needed to implement some tasks, using these four functions.

Assume that the mean time taken to implement previous project tasks is known, E_m. When asked to implement a new task, an optimist might estimate 20% lower than the mean, E_o=E_m*0.8, while a pessimist might estimate 20% higher than the mean, E_p=E_m*1.2. Data shows that the distribution of the number of tasks taking a given amount of time to implement is skewed, looking something like one of the lines in the plot below (code):

Two example distributions of number of tasks taking a given amount of time to implement.

We can simulate task implementation time by randomly drawing values from a distribution having this shape, e.g., zero-truncated Negative binomial or zero-truncated Weibull. The values of E_o and E_p are calculated from the mean, E_m, of the distribution used (see code for details). Below is each estimator’s score for each of the scoring functions (the best performing estimator for each scoring function in bold; 10,000 values were used to reduce small sample effects):

    SE   AE   APE   RE
E_o 2.73 1.29 0.51 0.56
E_m 2.31 1.23 0.39 0.68
E_p 2.70 1.37 0.36 0.86

Surprisingly, the identity of the best performing estimator (i.e., optimist, mean, or pessimist) depends on the scoring function used. What is going on?

The analysis of scoring functions is very new. A 2010 paper by Gneiting showed that it does not make sense to select the scoring function after the estimates have been made (he uses the term forecasts). The scoring function needs to be known in advance, to allow an estimator to tune their responses to minimise the value that will be calculated to evaluate performance.

The mathematics involves Bregman functions (new to me), which provide a measure of distance between two points, where the points are interpreted as probability distributions.

Which, if any, of these scoring functions should be used to evaluate the accuracy of software estimates?

In software estimation, perhaps the two most commonly used scoring functions are APE and RE. If management selects one or the other as the scoring function to rate developer estimation performance, what estimation technique should employees use to deliver the best performance?

Assuming that information is available on the actual time taken to implement previous project tasks, then we can work out the distribution of actual times. Assuming this distribution does not change, we can calculate APE and RE for various estimation techniques; picking the technique that produces the lowest score.

Let’s assume that the distribution of actual times is zero-truncated Negative binomial in one project and zero-truncated Weibull in another (purely for convenience of analysis, reality is likely to be more complicated). Management has chosen either APE or RE as the scoring function, and it is now up to team members to decide the estimation technique they are going to use, with the aim of optimising their estimation performance evaluation.

A developer seeking to minimise the effort invested in estimating could specify the same value for every estimate. Knowing the scoring function (top row) and the distribution of actual implementation times (first column), the minimum effort developer would always give the estimate that is a multiple of the known mean actual times using the multiplier value listed:

                   APE   RE
Negative binomial  1.4   0.5
Weibull            1.2   0.6

For instance, management specifies APE, and previous task/actuals has a Weibull distribution, then always estimate the value 1.2*E_m.

What mean multiplier should Esta Pert, an expert estimator aim for? Esta’s estimates can be modelled by the equation Act*U(0.5, 2.0), i.e., the actual implementation time multiplied by a random value uniformly distributed between 0.5 and 2.0, i.e., Esta is an unbiased estimator. Esta’s table of multipliers is:

                   APE   RE
Negative binomial  1.0   0.7
Weibull            1.0   0.7

A company wanting to win contracts by underbidding the competition could evaluate Esta’s performance using the RE scoring function (to motivate her to estimate low), or they could use APE and multiply her answers by some fraction.

In many cases, developers are biased estimators, i.e., individuals consistently either under or over estimate. How does an implicit bias (i.e., something a person does unconsciously) change the multiplier they should consciously aim for (having analysed their own performance to learn their personal percentage bias)?

The following table shows the impact of particular under and over estimate factors on multipliers:

                 0.8 underestimate bias   1.2 overestimate bias
Score function          APE   RE            APE   RE
Negative binomial       1.3   0.9           0.8   0.6
Weibull                 1.3   0.9           0.8   0.6

Let’s say that one-third of those on a team underestimate, one-third overestimate, and the rest show no bias. What scoring function should a company use to motivate the best overall team performance?

The following table shows that neither of the scoring functions motivate team members to aim for the actual value when the distribution is Negative binomial:

                    APE   RE
Negative binomial   1.1   0.7
Weibull             1.0   0.7

One solution is to create a bespoke scoring function for this case. Both APE and RE are special cases of a more general scoring function (see top). Setting beta=-0.7 in this general form creates a scoring function that produces a multiplication factor of 1 for the Negative binomial case.

A Review: Incineration Fest 2022 – Metal is back!

Overall I really enjoyed Incineration Fest and would go again if the line up is right for me. What was really great was seeing metallers back at gig with no restrictions and doing what we do best!

Winterfylleth

I completely fell in love with Winterfylleth when they played Bloodstock on the mainstage and even more so when they released the set as a live album. They are incredible and totally deserved to be opening proceedings at the Roundhouse for Incineration Fest. Actually, they deserved to be much higher up the bill. They’re a solid outfit, played what I wanted to hear and ended, as I always think of them ending from the live album, with Chris saying this is the last song “as time is short and our songs are long!” I need to see them do a headline set in a venue with a great PA soon.

Tsjuder

Tsjuder was the wildcard for me. I didn’t really know them and had heard only a few things on Spotify before, although what I heard was really good. I had no idea I was going to be blown away. They sounded incredible from the first note, which was even more impressive given that they are only a three piece and the PA in the Roundhouse wasn’t turning out to be great for definition.

Bloodbath

Bloodbath was really the reason I was at Incineration Fest. I’d missed them at Bloodstock ten years before as one of my sons was being born and I hadn’t had a chance to see them until now. Of course now Nick Holmes (Paradise Lost) rather than Mikael Åkerfeldt (Opeth) was on lead vocals.

I was very, very excited and from the moment I heard that trademark crunching guitar sound I was even more excited. They played for a full hour. Unlike the Black Metal bands on the bill there was more riffing and solos and a slight different drum sound.

They’re an odd band to watch. For reasons I don’t understand, the bass player and two guitarists would often turn their backs to the audience to face the dummer. The band didn’t seem to interact much with each other on stage and even less so with Nick.

Nick’s deadpan humour was present when he did speak to the audience. He introduced the band as being from Sweden, then added from Halifax almost as an afterthought! During the set he admitted he couldn’t see and dispensed with his sun glasses as they’d apparently been a good idea backstage. After breaking the microphone he enquired if it would be added to his bill at the end of the night.

Emperor

Emperor hasn't released any new material (that I know of) since 2001 and, if I’m honest, I barely listen to them beyond the live album these days. I’ve seen them at least three times before, the first time being in 1999 in a small club in Bradford on my birthday - it doesn’t get much better than that. I’m more of a fan of Ihsahn’s solo stuff these days and I still really enjoy Samoth’s Zyklon whenever I play it. Emperor, not so much anymore.

They played for the full ninety and for the most part were solid as you might expect. Whether or not Faust plays with is of no consequence to me and I certainly didn’t need to covers they played with him towards the end of the set. There was lots I knew and lots I enjoyed, but I wouldn't make an effort to see Emperor again.







The Middle Way – a.k.

A few years ago we spent some time implementing a number of the sorting, searching and set manipulation algorithms from the standard C++ library in JavaScript. Since the latter doesn't support the former's abstraction of container access via iterators we were compelled to restrict ourselves to using native Array objects following the conventions of its methods, such as slice and sort.
In this post we shall take a look at an algorithm for finding the centrally ranked element, or median, of an array, which is strongly related to the ak.nthElement function, and then at a particular use for it.

Twitter and evidence-based software engineering

This year’s quest for software engineering data has led me to sign up to Twitter (all the software people I know, or know-of, have been contacted, and discovery through articles found on the Internet is a very slow process).

@evidenceSE is my Twitter handle. If you get into a discussion and want some evidence-based input, feel free to get me involved. Be warned that the most likely response, to many kinds of questions, is that there is no data.

My main reason for joining is to try and obtain software engineering data. Other reasons include trying to introduce an evidence-based approach to software engineering discussions and finding new (to me) problems that people want answers to (that are capable of being answered by analysing data).

The approach I’m taking is to find software engineering tweets discussing a topic for which some data is available, and to jump in with a response relating to this data. Appropriate tweets are found using the search pattern: (agile OR software OR "story points" OR "story point" OR "function points") (estimate OR estimates OR estimating OR estimation OR estimated OR #noestimates OR "evidence based" OR empirical OR evolution OR ecosystems OR cognitive). Suggestions for other keywords or patterns welcome.

My experience is that the only effective way to interact with developers is via meaningful discussion, i.e., cold-calling with a tweet is likely to be unproductive. Also, people with data often don’t think that anybody else would be interested in it, they have to convinced that it can provide valuable insight.

You never know who has data to share. At a minimum, I aim to have a brief tweet discussion with everybody on Twitter involved in software engineering. At a minute per tweet (when I get a lot more proficient than I am now, and have workable templates in place), I could spend two hours per day to reach 100 people, which is 35,000 per year; say 20K by the end of this year. Over the last three days I have managed around 10 per day, and obviously need to improve a lot.

How many developers are on Twitter? Waving arms wildly, say 50 million developers and 1 in 1,000 have a Twitter account, giving 50K developers (of which an unknown percentage are active). A lower bound estimate is the number of followers of popular software related Twitter accounts: CompSciFact has 238K, Unix tool tips has 87K; perhaps 1 in 200 developers have a Twitter account, or some developers have multiple accounts, or there are lots of bots out there.

I need some tools to improve the search process and help track progress and responses. Twitter has an API and a developer program. No need to worry about them blocking me or taking over my business; my usage is small fry and I’ not building a business to take over. I was at Twitter’s London developer meetup in the week (the first in-person event since Covid) and the youngsters present looked a lot younger than usual. I suspect this is because the slightly older youngsters remember how Twitter cut developers off at the knee a few years ago by shutting down some useful API services.

The Twitter version-2 API looks interesting, and the Twitter developer evangelists are keen to attract developers (having ‘wiped out’ many existing API users), and I’m happy to jump in. A Twitter API sandbox for trying things out, and there are lots of example projects on Github. Pointers to interesting tools welcome.

The only thing you can do wrong, and the opposite of agile

The only thing you can do wrong in agile is to work the same way you did 3 months ago.

Corollary: The opposite of agile is static.

“The only thing you can do wrong” is born out of two things: my belief that agile is all about learning and putting learning into action.

Second, the proliferation of “agile tests” – maturity models, things like the Nokia Scrum test and the mental models in our own minds which condemn teams . Failing these tests means you label the team something like “Agile in name only”, “ScumBut” or “ScrummerFall”. I’ve seen my own share of these teams but honestly, I don’t care. Motivation to change a bigger issue, as long a the team are trying to improve its just a question of time – you might be “AINO” but if you keep trying you will become “Kick-ass agile.” (Plus, as an agile consultant these teams are potential clients, send them to me!)

If you are learning and attempting to act on that learning you will make some false moves, you will need to undo some changes but over time you will move forward.

You need to learn in three domains: Solution domain – the tools and technology you use to craft solutions and systems; Application or problem domain – understanding the problem/opportunity you are trying to solve/exploit, understanding what customers’ need and what the market will pay for. (I tend to think of this as the demand side); and the Process domain – the way you work, your processes and practices, the most obvious place were “agile” fits in.

The boundaries between these domains are fluid you need a learning culture in all three.

More recently I realised that “the only thing you can do wrong” has a corollary:

        The opposite of agile is static, not learning and not changing

20 years ago agile advocates invented Waterfall to be the enemy – sure the cascade model, à la Royce, was industry standard but nobody, NOBODY, called it Waterfall – believe me, I’m old enough to remember first hand. Agile was described by reference to what agile was not, and that not was named “waterfall.”

But that is wrong.

The secret is in the words: Agile implies movement and action.

Not moving is called Static, so the opposite of agile is static.

Once you accept agile means movement and change the next question is: how fast?

Rather than agile tests and “agile maturity models” we should be to measuring the speed of change and the speed of improvement: how fast are the team learning and acting on that learning? how successful are they at that?

When I floated this suggestion on LinkedIn a few days ago a few people pushed back. One argument was that I was advocating “change for change’s sake”. I’m not. You are free to not change, you are free to not change if you wish, all I’m saying is don’t call that agile, ‘cos its not, that is static.

Not changing is static, and static is not agile.

Now maybe static is the right thing for you. If your team can repeat their way of working, if they are predictable, and if the team and stakeholders are content then then change! Embrace static.

I’m doubtful such a static position is anything other than a short term Stable Intermediate Form. Static implies a stable environment, one in which at all forces are in equilibrium: nobody is asking for faster delivery, nobody is changing their ask, nobody is complaining about technical debt, overwork, predictability and so on. If people are not complaining then celebrate, you are living the dream! Just don’t call it agile.

“Finished” software is static because nobody changes it. Software ages because it stays unchanged while the world around it changes.

A second argument was that constant change was not good for people. Now I’m not arguing for constant big change, or constant top-down change. In my mind agile change is largely incremental – sure sometimes you need a big change but most of the changes are small.

I also challenge the assumption that change is top-down and that change is done to people. In my experience when those doing the work are enrolled in the change process, when they can see the potential benefits then it is a different matter. In my experience “change resistance” is more likely “resistance to being changed.”

Finally, “last 3 months”. That time frame comes from my intuition, it is a period long enough to see if change has happened without demanding that the team are never repeating themselves. I imagine the team moving from one Stable Intermediate Form to another Stable Intermediate Form over those 3 months. Feel free to suggest another time frame but if you think 3 months is not long enough to see change then how long? 4 months? 6 months? 12 months?


Subscribe to my blog newsletter and download Continuous Digital for free

The post The only thing you can do wrong, and the opposite of agile appeared first on Allan Kelly.

User Stories by Example: certifcate added to the free courses

A couple of months ago I made my online User Stories by Example tutorial series free. One client suggested that the series would benefit from a certificate at the end. Good idea, I’m always open to suggestions.

So, I’ve added a new tutorial to the series: exam and certificate – rather than just give a certificate of attendance to anyone who plays the videos I’ve set a little test. There is a bank of questions which are randomly selected and should cover the five areas of the tutorials. Score over 60% on the test and you get a certificate which lists the key topics covered.

I’ve set a small fee for this, once you have paid you have 21 days to do the exam and you can sit the exam as many times as you like.

As always, if you have any suggestions or other feedback please let me know. And if you have any ideas for new questions send them over, I’d love to increase the question pool.


Subscribe to my blog newsletter and download Continuous Digital for free

The post User Stories by Example: certifcate added to the free courses appeared first on Allan Kelly.

Improving my vimrc live on stream

I was becoming increasingly uncomfortable with how crufty my neovim config was getting, and especially how I didn’t understand parts of it, so I decided to wipe it clean and rebuild it from scratch.

I did it live on stream, to make it feel like a worthwhile activity:

Headline features of the new vimrc:

  • A new theme, using the base16 theme framework
  • A file browser (NERDTree)
  • A minimalist status line with vim-airline
  • Search with ripgrep
  • Rust language support with Coc

Note: after the stream I managed to resolve the remaining issues with highlight colours not showing by triggering re-applying them after the theme has been applied:

augroup tabs_in_make
    autocmd!
    autocmd ColorScheme * highlight MatchParen cterm=none ctermbg=none ctermfg=green
augroup END

You can find my current neovim config at gitlab.com/andybalaam/configs/-/tree/main/.config/nvim.

Evidence-based Software Engineering: now in paperback form

I made my Evidence-based Software Engineering book available as a pdf file. While making a printed version available looked possible, I was uncertain that the result would be of acceptable quality; the extensive use of color and an A4 page size restricted the number of available printers who could handle it. Email exchanges with several publishers suggested that the number of likely print edition copies sold would be small (based on experience with other books, under 100). The pdf was made available under a creative commons license.

Around half-million copies of the pdf have been downloaded (some partially).

A few weeks ago, I spotted a print version of this book on Amazon (USA). I have no idea who made this available. Is the quality any good? I was told that it was, so I bought a copy.

The printed version looks great, with vibrant colors, and is reasonably priced. It sits well in the hand, while reading. The links obviously don’t work for the paper version, but I’m well practised at using multiple fingers to record different book locations.

I have one report that the Kindle version doesn’t load on a Kindle or the web app.

If you love printed books, I heartily recommend the paperback version of Evidence-based Software Engineering; it even has a 5-star review on Amazon 😉

Programming language similarity based on their traits

A programming language is sometimes described as being similar to another, more wide known, language.

How might language similarity be measured?

Biologists ask a very similar question, and research goes back several hundred years; phenetics (also known as taximetrics) attempts to classify organisms based on overall similarity of observable traits.

One answer to this question is based on distance matrices.

The process starts by flagging the presence/absence of each observed trait. Taking language keywords (or reserved words) as an example, we have (for a subset of C, Fortran, and OCaml):

            if   then  function   for   do   dimension  object
C            1     1       0       1     1       0        0
Fortran      1     0       1       0     1       1        0
OCaml        1     1       1       1     1       0        1

The distance between these languages is calculated by treating this keyword presence/absence information as an n-dimensional space, with each language occupying a point in this space. The following shows the Euclidean distance between pairs of languages (using the full dataset; code+data):

                C        Fortran      OCaml
C               0        7.615773     8.717798
Fortran      7.615773    0            8.831761
OCaml        8.717798    8.831761     0

Algorithms are available to map these distance pairs into tree form; for biological organisms this is known as a phylogenetic tree. The plot below shows such a tree derived from the keywords supported by 21 languages (numbers explained below, code+data):

Tree showing relative similarity of languages based on their keywords.

How confident should we be that this distance-based technique produced a robust result? For instance, would a small change to the set of keywords used by a particular language cause it to appear in a different branch of the tree?

The impact of small changes on the generated tree can be estimated using a bootstrap technique. The particular small-change algorithm used to estimate confidence levels for phylogenetic trees is not applicable for language keywords; genetic sequences contain multiple instances of four DNA bases, and can be sampled with replacement, while language keywords are a set of distinct items (i.e., cannot be sampled with replacement).

The bootstrap technique I used was: for each of the 21 languages in the data, was: add keywords to one language (the number added was 5% of the number of its existing keywords, randomly chosen from the set of all language keywords), calculate the distance matrix and build the corresponding tree, repeat 100 times. The 2,100 generated trees were then compared against the original tree, counting how many times each branch remained the same.

The numbers in the above plot show the percentage of generated trees where the same branching decision was made using the perturbed keyword data. The branching decisions all look very solid.

Can this keyword approach to language comparison be applied to all languages?

I think that most languages have some form of keywords. A few languages don’t use keywords (or reserved words), and there are some edge cases. Lisp doesn’t have any reserved words (they are functions), nor technically does Pl/1 in that the names of ‘word tokens’ can be defined as variables, and CHILL implementors have to choose between using Cobol or PL/1 syntax (giving CHILL two possible distinct sets of keywords).

To what extent are a language’s keywords representative of the language, compared to other languages?

One way to try and answer this question is to apply the distance/tree approach using other language traits; do the resulting trees have the same form as the keyword tree? The plot below shows the tree derived from the characters used to represent binary operators (code+data):

Tree showing relative similarity of languages based on their binary operator character representation.

A few of the branching decisions look as-if they are likely to change, if there are changes to the keywords used by some languages, e.g., OCaml and Haskell.

Binary operators don’t just have a character representation, they can also have a precedence and associativity (neither are needed in languages whose expressions are written using prefix or postfix notation).

The plot below shows the tree derived from combining binary operator and the corresponding precedence information (the distance pairs for the two characteristics, for each language, were added together, with precedence given a weight of 20%; see code for details).

Tree showing relative similarity of languages based on their binary operator character representation and corresponding precedence.

No bootstrap percentages appear because I could not come up with a simple technique for handling a combination of traits.

Are binary operators more representative of a language than its keywords? Would a combined keyword/binary operator tree would be more representative, or would more traits need to be included?

Does reducing language comparison to a single number produce something useful?

Languages contain a complex collection of interrelated components, and it might be more useful to compare their similarity by discrete components, e.g., expressions, literals, types (and implicit conversions).

What is the purpose of comparing languages?

If it is for promotional purposes, then a measurement based approach is probably out of place.

If the comparison has a source code orientation, weighting items by source code occurrence might produce a more applicable tree.

Sometimes one language is used as a reference, against which others are compared, e.g., C-like. How ‘C-like’ are other languages? Taking keywords as our reference-point, comparing languages based on just the keywords they have in common with C, the plot below is the resulting tree:

Tree showing similarity of languages based on the keywords they share with C.

I had expected less branching, i.e., more languages having the same distance from C.

New languages can be supported by adding a language file containing the appropriate trait information. There is a Github repo, prog-lang-traits, send me a pull request to add your language file.

It’s also possible to add support for more language traits.

On Pitfall – student

Recall that in the Baron's latest wager, Sir R-----'s goal was to traverse a three by three checkerboard in steps determined by casts of a four sided die, each at a cost of two coins. Moving from left to right upon the first rank and advancing to the second upon its third file, thereafter from right to left and advancing upon the first file and finally from left to right again, he should have prevailed for a prize of twenty five coins had he landed upon the top right place. Frustrating his progress, however, were the rules that landing upon a black square dropped him back down to the first rank and that overshooting the last file upon the last rank required that he should move in reverse by as many places with which he had done so.

Devin Townsend at the Royal Albert Hall (again)

Leprous

There’s an obvious pull for me towards Leprous due to the association with Ihsahn and prog, but rock bands generally do little for me these days. I listened to a little of Aphelion before the gig, but it didn’t grip me.

They’re an odd live band and some of the time the cello player looked a bit out of place when he was without his cello. The sharing of the keyboards among various band members, often in the same song, was also weird. The singer was wearing a waistcoat and doing some very odd dancing and his voice can grate. For a prog band the lack of any guitar or keyboard lead breaks was also weird.

However, I quite enjoyed Leprous!


Devin Townsend

We’d only seen Devin Townsend a few months ago (in the summer at Bloodstock), but my wife loves him so we went again. We should have gone the night before as he played loads of songs we knew, in contrast to the night we went where he played nothing we knew! Most of it, I am reliably informed, was from the Ocean Machine and Infinity albums.

Devin still plays brilliantly and it was great to see him again with the session musicians he’d teamed up with for his Bloodstock performance. He creates a fantastic wall of sound and engages with the crowd like few others. I’m sure we’ll go and see him again, after all we’ve not heard Hyperdrive live yet!

Agile OKRs extra – yet another book

I blogged last week that I had begun work on a new book – How I Write Books which is now a work in progress at LeanPub – signup and be the first to know when the draft is published.

Well a funny thing happened while I was setting up my tool chain to write that book: I found another book! Well, perhaps half a book is a better description.

Succeeding with OKRs in Agile Extra is a companion to last year’s best seller, Succeeding with OKRs in Agile. But it isn’t a complete book in its own right, it isn’t really a sequel, it is a companion. It contains a mix of material. Material which didn’t really fit in the first book, material with was’t needed, ideas which didn’t develop far enough and some unfinished chapters.

As such it is like my Xanpan Appendix, unused material which is still interesting and might appear elsewhere in time.

I really want to work on How I write books so I don’t have any immediate plans to progress extra. If you enjoyed Succeeding with OKRs in Agile, if you would like to know more, or if you would like to just see how a writer’s mind works check out Succeeding with OKRs in Agile Extra.

The post Agile OKRs extra – yet another book appeared first on Allan Kelly.

Ecology as a model for the software world

Changing two words in the Wikipedia description of Ecology gives “… the study of the relationships between software systems, including humans, and their physical environment”; where physical environment might be taken to include the hardware on which software runs and the hardware whose behavior it controls.

What do ecologists study? Wikipedia lists the following main areas; everything after the first sentence, in each bullet point, is my wording:

  • Life processes, antifragility, interactions, and adaptations.

    Software system life processes include its initial creation, devops, end-user training, and the sales and marketing process.

    While antifragility is much talked about, it is something of a niche research topic. Those involved in the implementations of safety-critical systems seem to be the only people willing to invest the money needed to attempt to build antifragile software. Is N-version programming the poster child for antifragile system software?

    Interaction with a widely used software system will have an influence on the path taken by cultures within associated microdomains. Users adapt their behavior to the affordance offered by a software system.

    A successful software system (and even unsuccessful ones) will exist in multiple forms, i.e., there will be a product line. Software variability and product lines is an active research area.

  • The movement of materials and energy through living communities.

    Is money the primary unit of energy in software ecosystems? Developer time is needed to create software, which may be paid for or donated for free. Supporting a software system, or rather supporting the needs of the users of the software is often motivated by a salary, although a few do provide limited free support.

    What is the energy that users of software provide? Money sits at the root; user attention sells product.

  • The successional development of ecosystems (“… succession is the process of change in the species structure of an ecological community over time.”)

    Before the Internet, monthly computing magazines used to run features on the changing landscape of the computer world. These days, we have blogs/podcasts telling us about the latest product release/update. The Ecosystems chapter of my software engineering book has sections on evolution and lifespan, but the material is sparse.

    Over the longer term, this issue is the subject studied by historians of computing.

    Moore’s law is probably the most famous computing example of succession.

  • Cooperation, competition, and predation within and between species.

    These issues are primarily discussed by those interested in the business side of software. Developers like to brag about how their language/editor/operating system/etc is better than the rest, but there is no substance to the discussion.

    Governments have an interest in encouraging effective competition, and have enacted various antitrust laws.

  • The abundance, biomass, and distribution of organisms in the context of the environment.

    These are the issues where marketing departments invest in trying to shift the distribution in their company’s favour, and venture capitalists spend their time trying to spot an opportunity (and there is the clickbait of language popularity articles).

    The abundance of tools/products, in an ecosystem, does not appear to deter people creating new variants (suggesting that perhaps ambition or dreams are the unit of energy for software ecosystems).

  • Patterns of biodiversity and its effect on ecosystem processes.

    Various kinds of diversity are important for biological systems, e.g., the mutual dependencies between different species in a food chain, and genetic diversity as a resource that provides a mechanism for species to adapt to changes in their environment.

    It’s currently fashionable to be in favour of diversity. Diversity is so popular in ecology that a 2003 review listed 24 metrics for calculating it. I’m sure there are more now.

    Diversity is not necessarily desired in software systems, e.g., the runtime behavior of source code should not depend on the compiler used (there are invariably edge cases where it does), and users want different editor command to be consistently similar.

    Open source has helped to reduce diversity for some applications (by reducing the sales volume of a myriad of commercial offerings). However, the availability of source code significantly reduces the cost/time needed to create close variants. The 5,000+ different cryptocurrencies suggest that the associated software is diverse, but the rapid evolution of this ecosystem has driven developers to base their code on the source used to implement earlier currencies.

    Governments encourage competitive commercial ecosystems because competition discourages companies charging high prices for their products, just because they can. Being competitive requires having products that differ from other vendors in a desirable way, which generates diversity.

How I write books – A book about books

In the last 15 years I’ve written and published 3 books with publishers, published 5 books myself, plus edited one conference proceedings and pushed out three “mini books” (one with 3 editions) which I never publicised.

In addition I’ve contributed forwards and chapters to at least six books and had two books translated.

Then there are countless magazine and journal articles but they stretch back further, closer to 25 years – and this blog from 2005.

Not bad for a kid who was thrown out of school after asking a teacher how to spell “at” – age of eight, a diagnosed dyslexic who had to learn to read three times – I can’t read my own handwriting its so bad.

As a result I’ve learned a lot about writing and publishing. In the last few years I’ve spoken to many people who want to know how to write and publish their own book. A couple of years ago Steve Smith suggested I write a book about writing books. I’ve been avoiding that until this month.

Now I’ve started: How I write books, https://leanpub.com/howIwrite – sign-up to be the first to known when the MVP is published. And if there is anything you would like me to write about in the book please let me know.

The post How I write books – A book about books appeared first on Allan Kelly.

NoEstimates panders to mismanagement and developer insecurity

Why do so few software development teams regularly attempt to estimate the duration of the feature/task/functionality they are going to implement?

Developers hate giving estimates; estimating is very hard and estimates are often inaccurate (at a minimum making the estimator feel uncomfortable and worse when management treats an estimate as a quotation). The future is uncertain and estimating provides guidance.

Managers tell me that the fear of losing good developers dissuades them from requiring teams to make estimates. Developers have told them that they would leave a company that required them to regularly make estimates.

For most of the last 70 years, demand for software developers has outstripped supply. Consequently, management has to pay a lot more attention to the views of software developers than the views of those employed in most other roles (at least if they want to keep the good developers, i.e., those who will have no problem finding another job).

It is not difficult for developers to get a general idea of how their salary, working conditions and practices compares with other developers in their field/geographic region. They know that estimating is not a common practice, and unless the economy is in recession, finding a new job that does not require estimation could be straight forward.

Management’s demands for estimates has led to the creation of various methods for calculating proxy estimate values, none of which using time as the unit of measure, e.g., Function points and Story points. These methods break the requirements down into smaller units, and subcomponents from these units are used to calculate a value, e.g., the Function point calculation includes items such as number of user inputs and outputs, and number of files.

How accurate are these proxy values, compared to time estimates?

As always, software engineering data is sparse. One analysis of 149 projects found that Cost approx FunctionPoints^{0.75}, with the variance being similar to that found when time was estimated. An analysis of Function point calculation data found a high degree of consistency in the calculations made by different people (various Function point organizations have certification schemes that require some degree of proficiency to pass).

Managers don’t seem to be interested in comparing estimated Story points against estimated time, preferring instead to track the rate at which Story points are implemented, e.g., velocity, or burndown. There are tiny amounts of data comparing Story points with time and Function points.

The available evidence suggests a relationship connecting Function points to actual time, and that Function points have similar error bounds to time estimates; the lack of data means that Story points are currently just a source of technobabble and number porn for management power-points (send me Story point data to help change this situation).

Testing Our Students – a.k.

Last time we saw how we can use the chi-squared distribution to test whether a sample of values is consistent with pre-supposed expectations. A couple of months ago we took a look at Student's t-distribution which we can use to test whether a set of observations of a normally distributed random variable are consistent with its having a given mean when its variance is unknown.

Learnings from Decapitated

When am I going to learn? 

The first two times I saw Decapitated, Bloodstock and then supporting someone in Norwich at the Waterfront, they were incredible. 

In early 2020 in London they sounded awful and we left. Last night in Norwich they sounded terrible again. 

I don’t know if it was them or the sound system, but there was no definition. It was all drums, vocals and not much else, so we gave up halfway through. 

I’m hoping the new album will be amazing, they’ll play at the UEA and it will be amazing. However, if I am going to learn, then I won’t be risking it.

Its the engineers, stupid – one from the heart

When I engage with company and teams I’m always keen – nee desperate – to get to meet the engineers and teams who are doing the work. If days, maybe even weeks, go by and I’m not doing that I get very frustrated. More importantly I’m not sure what to believe from those I am talking to.

There was once a bank I spent time with. As soon as I got to the office I discovered almost all the engineers were in a far away country and I wasn’t going to get to visit that country. The few engineers in the London office spent a lot of their time hand-holding those in the far away place. When you looked closely, when you spoke to the engineers far away you found things didn’t add up. One delivered a perfect 10 story points every iteration without fail. Another team increased velocity sprint after sprint. One engineer fell off his moped and broke his arm, the work was still delivered on time – it took all my wiles to discover another engineer had worked all weekend to meet the deadline.

Why am I so desperate to meet the engineers? – well there are several reasons, some more rational than others.

First off, the engineers are where the work happens. In lean parlance they are the gemba, source of truth.

Second, these are the people who will need to change or be changed. There is only so much you can change with an organigram – and to be honest, I’m doubtful reorgs really change much. Sometimes I imagine managers moving their workers around like pawns on a chess board while the reality of work is hand-to-hand combat.

Thirdly, and perhaps most importantly for me: I see my role as helping these people. I am, by profession, by temperament and by ancestry an engineer. I am motivated by the desire to help those who do the work have a more fulfilling life. I still remember the frustrations I faced as a coding software engineer.

Thats why it hurts – really hurts – when engineers tell me “agile is rubbish”, that “agile has nothing to offer”, when they tell me that I’m not helping. Its not that I’m precious about agile, “agile” is just the toolset I’ve found helps. I also know that tool kit allows me to go outside the toolkit.

I was hired by a Californian company to give agile training to their Cambridge team. A few minutes in, one of the engineers told me directly “Agile can’t help us here, we can’t go any lower.” The other engineers in the room were of the same opinion. It turned out the managers had been to Scrum training and come back pumped up about high performing teams and faster-better-cheaper. Sustainable pace, autonomy and quality weren’t on the table.

That hurt and it may have been the toughest training gig I’ve ever had but I think I turned it around. I demonstrated the need for quality and explained the managers were missing essential parts of the puzzle. Unfortunately I didn’t get to meet the managers – they were off playing chess.

But I do engage with managers. Often they are the route to the engineers. Unfortunately some engineers see that as a problem in itself: “our problem is tech debt, sprinting won’t help us” so I’m discounted. In my world – the world of Xanpan – sprinting is a rod you put up your back to make yourself better, if you don’t address quality (e.g. tech debt) issues then you won’t succeed at time-boxed iterations.

(BTW I talk about engineers because most of my work is with engineers, and software engineers at that. I’ve worked a little with other professions and I’m sure most of what I say carries across directly but my experience and empathy is greatest with engineers.)

To deal with managers one needs to understand their concerns, one needs to listen and speak in ways they understand. Engineers may struggle with managers and technical issues but managers also struggle with their managers, organizational debt, customers and the market.

The same is true when I wonder over into the world of product ownership – Product Managers and Business Analysts. Engineers have a bad habit of seeing these roles as “Management” but if you spend time with the “demand side” people you find their concerns are almost identical to coding engineers. BAs worry that what they are being asked to do is unreasonable, that it doesn’t make sense, that something else needs to change first and that people don’t appreciate how things really work. The biggest difference between programmers and BAs is simply that, on average, BAs dress more smartly and are more likely to put on a tie.

One can’t understand a system and one can’t get to the truth if one can’t visit the place where work happens. When manufacturing things that place is the production line, in the digital world that place is the mind. Constructing software is an intellectual exercise that happens in the mind and is only manifested via a keyboard in code. To see the truth one has to speak to engineers.

I’ve seen some awful work environments: a room packed with 28 engineers, very few windows, little fresh air, a development manager on a raised platform at one end, the HR manager at the other end, her desk right by the single door in and out with the clock-in-clock-out cards on the wall.

More recently a large project at a matrix managed organization. The complexity made it difficult to know who was actually on the project and what teams existed. Management existed in its own bubble.

I feel pain simply seeing such places. What it can be like to work there I can only imagine. I assume people become dumb to the pain, switch off to the failing and accept the normalisation of deviance. Or, to put it another way: a culture of failure.

Both of these two examples shared one thing in common: massive Gantt charts which claimed to plan the work. In one case I saw someone scheduled to spend a month writing a manual in two years time. While these charts claim rationality they are so disconnected with the gemba as to be fantasies. I feel cognitive dissonance knowing that the managers who put their faith in such mechanisms are both rational and totally mad.

Encountering such places is painful for me. On the one hand I want to help, I want to make the engineers lives better – that is what I do! The challenge can be great. On the other hand it can be mentally and emotionally draining. Because I am passionate about what I do I feel that. If I switched off, if I treated it as a money paying gig then I too become part of the same culture and loose my efficacy.

On the other hand, when things go right I love it – perhaps because I’m an engineer and I see fixing the organization as a way of fixing the code, its called Conway’s Law.


Subscribe to my blog newsletter and download Continuous Digital for free

The post Its the engineers, stupid – one from the heart appeared first on Allan Kelly.

Réussir avec les OKR en Agile (French OKRs)

I am delighted to say The French translation of Succeeding with OKRs in AgileRéussir avec les OKR en Agile – is now available thanks to the hard work of Nicolas Mereaux and Fabrice Aimetti.

The book is available right now on LeanPub as an e-book. After Easter we’ll start work on getting it a print version available.

Until then a big thanks to Nicolas and Fabrice!

(Please get in touch if you are interested in translating the book to your favourite language.)

The post Réussir avec les OKR en Agile (French OKRs) appeared first on Allan Kelly.

4,000 vs 400 vs 40 hours of software development practice

What is the skill difference between professional developers and newly minted computer science graduates?

Practice, e.g., 4,000 vs. 400 hours

People get better with practice, and after two years (around 4,000 hours) a professional developer will have had at least an order of magnitude more practice than most students; not just more practice, but advice and feedback from experienced developers. Most of these 4,000 hours are probably not the deliberate practice of 10,000 hours fame.

It’s understandable that graduates with a computing degree consider themselves to be proficient software developers; this opinion is based on personal experience (i.e., working with other students like themselves), and not having spent time working with professional developers. It’s not a joke that a surprising number of academics don’t appreciate the student/professional difference, the problem is that some academics only ever get to see a limit range of software development expertise (it’s a question of incentives).

Surveys of student study time have found that for Computer science, around 50% of students spend 11 hours or more, per week, in taught study and another 11 hours or more doing independent learning; let’s take 11 hours per week as the mean, and 30 academic weeks in a year. How much of the 330 hours per year of independent learning time is spent creating software (that’s 1,000 hours over a three-year degree, assuming that any programming is required)? I have no idea, and picked 40% because it matched up with 4,000.

Based on my experience with recent graduates, 400 hours sounds high (I have no idea whether an average student spends 4-hours per week doing programming assignments). While a rare few are excellent, most are hopeless. Perhaps the few hours per week nature of their coding means that they are constantly relearning, or perhaps they are just cutting and pasting code from the Internet.

Most graduates start their careers working in industry (around 50% of comp sci/maths graduates work in an ICT profession; UK higher-education data), which means that those working in industry are ideally placed to compare the skills of recent graduates and professional developers. Professional developers have first-hand experience of their novice-level ability. This is not a criticism of computing degrees; there are only so many hours in a day and lots of non-programming material to teach.

Many software developers working in industry don’t have a computing related degree (I don’t). Lots of non-computing STEM degrees give students the option of learning to program (I had to learn FORTRAN, no option). I don’t have any data on the percentage of software developers with a computing related degree, and neither do I have any data on the average number of hours non-computing STEM students spend on programming; I’ve cosen 40 hours to flow with the sequence of 4’s (some non-computing STEM students spend a lot more than 400 hours programming; I certainly did). The fact that industry hires a non-trivial number of non-computing STEM graduates as software developers suggests that, for practical purposes, there is not a lot of difference between 400 and 40 hours of practice; some companies will take somebody who shows potential, but no existing coding knowledge, and teach them to program.

Many of those who apply for a job that involves software development never get past the initial screening; something like 80% of people applying for a job that specifies the ability to code, cannot code. This figure is based on various conversations I have had with people about their company’s developer recruitment experiences; it is not backed up with recorded data.

Some of the factors leading to this surprisingly high value include: people attracted by the salary deciding to apply regardless, graduates with a computing degree that did not require any programming (there is customer demand for computing degrees, and many people find programming is just too hard for them to handle, so universities offer computing degrees where programming is optional), concentration of the pool of applicants, because those that can code exit the applicant pool, leaving behind those that cannot program (who keep on applying).

Apologies to regular readers for yet another post on professional developers vs. students, but I keep getting asked about this issue.

No unified theory of agile (Agile mindset cont.)

Continuing my quest of “The Agile Mindset” I’ve been searching for a metaphor to put all the different ideas on agile into order. To cut to the chase: there isn’t one.

As much as I would love to boil “agile” down to one thing, or even a handful of key concepts and I don’t think there will ever be a single unified theory of agile. As I said before with the elephant example, everyone sees something different. Depending on where you are standing, the problems you face today, your own history and area of knowledge, and your own world view you are going to see and emphasise different aspects of “The Agile Mindset.” And you know what? that is a good thing!

First I tried thinking of Agile as the layers of an onion. The outside skin is the word “agile” – there is a valid reason for saying the agile mindset is there in the dictionary definition of the word agile: able to move, think and understand quickly, be nimble and subtle in movements, alert and observant when looking at whats happening and sharp when thinking. But that is only the outside layer, it is very general.

The agile manifesto would be the second layer. While the manifesto is specific to software it doesn’t require too much thought to generalise it to other domains. Actually, it is a bit too easy and there are more attempts to generalise it than there are people who have attempted to generalise it. Plus, as I’ve written before, the manifesto is over 20 years old, those who cling to it sound like Supreme Court Justices trying to read meaning into a document written in a different age.

And what are the next layers, and in what order? Does last responsible moment come above or below cost of delay? Is test driven more important than time-boxing? Are work-in-progress limits a version of time-boxing or an alternative to time-boxing?

And what is at the centre? For me it is learning but I can imagine people who will say it is People – perhaps manifested through Weinberg’s “Its always a people problem” quote – I’ve written about that one too, the People Problem Problem. Personally I think McGregor’s Theory X and Theory Y could be a candidate, which raises the question of agile’s fellow travellers – beyond budgeting, system thinking, Lean, and Mintzberg’s theory of emergent strategy.

I wondered if agile could be thought of as a brick wall, with each idea forming a brick, and then the whole being more than the sum of the parts. But that falls down (sorry for the pun!) on the layering problem. Which ideas are foundations and which decorative?

Similarly, I toyed with The House of Agile with different ideas represented by different rooms but that metaphor quickly runs into problems too.

In a way this makes the search fo One Agile Mindset even more desirable – the search for a grand unified theory of everything if you like. There must be something out there that combines all of this!

Ye the search for the unifying theory also highlights how damn difficult this is. Intellectually it is hard to accept that “the agile mindset” is a bunch of different ideas which different people interpret differently.

But you know what? Accepting that agile is diverse is itself agile – agile is not one idea, it is many, accepting that and valuing those different ideas mean embracing diversity and that itself is agile. Agile is what you want it to be because through those diverse we find alternative ways of viewing and learning. Agile doesn’t stand still, agile is punk, at its best agile is democracy.

Unfortunately that makes it hard to explain.


Subscribe to my blog newsletter and download Continuous Digital for free

The post No unified theory of agile (Agile mindset cont.) appeared first on Allan Kelly.

Anthropological studies of software engineering

Anthropology is the study of humans, and as such it is the top level research domain for many of the human activities involved in software engineering. What has been discovered by the handful of anthropologists who have spent time researching the tiny percentage of humans involved in writing software?

A common ‘discovery’ is that developers don’t appear to be doing what academics in computing departments claim they do; hardly news to those working in industry.

The main subfields relevant to software are probably: cultural anthropology and social anthropology (in the US these are combined under the name sociocultural anthropology), plus linguistic anthropology (how language influences social life and shapes communication). There is also historical anthropology, which is technically what historians of computing do.

For convenience, I’m labelling anybody working in an area covered by anthropology as an anthropologist.

I don’t recommend reading any anthropology papers unless you plan to invest a lot of time in some subfield. While I have read lots of software engineering papers, anthropologist’s papers on this topic are often incomprehensible to me. These papers might best be described as anthropology speak interspersed with software related terms.

Anthropologists write books, and some of them are very readable to a more general audience.

The Art of Being Human: A Textbook for Cultural Anthropology by Wesch is a beginner’s introduction to its subject.

Ethnography, which explores cultural phenomena from the point of view of the subject of the study, is probably the most approachable anthropological research. Ethnographers spend many months living with a remote tribe, community, or nowadays a software development company, and then write-up their findings in a thesis/report/book. Examples of approachable books include: “Engineering Culture: Control and Commitment in a High-Tech Corporation” by Kunda, who studied a large high-tech company in the mid-1980s; “No-Collar: The Humane Workplace and its Hidden Costs” by Ross, who studied an internet startup that had just IPO’ed, and “Coding Freedom: The Ethics and Aesthetics of Hacking” by Coleman, who studied hacker culture.

Linguistic anthropology is the field whose researchers are mostly likely to match developers’ preconceived ideas about what humanities academics talk about. If I had been educated in an environment where Greek and nineteenth century philosophers were the reference points for any discussion, then I too would use this existing skill set in my discussions of source code (philosophers of source code did not appear until the twentieth century). Who wouldn’t want to apply hermeneutics to the interpretation of source code (the field is known as Critical code studies)?

It does not help that the software knowledge of many of the academics appears to have been acquired by reading computer books from the 1940s and 1950s.

The most approachable linguistic anthropology book I have found, for developers, is: The Philosophy of Software Code and Mediation in the Digital Age by Berry (not that I have skimmed many).

Letter to Anneliese Dodds on the invasion of Ukraine by Russia

Dear Anneliese Dodds,

I learn from the BBC (https://www.bbc.co.uk/news/58888451) that "The UK is to phase out Russian oil by the end of the year" and "Russian imports account for 8% of total UK oil demand".

8% is a small amount and the end of the year is a long time in the future. We need immediate action to change Russia's course. Please use all your influence to this end.

Some suggestions, as a minimum:

  • Stop all petrochemical purchases from Russia, and requiring this of multinationals
  • Expulsion of all remaining Russian banks from SWIFT
  • Make it unlawful to insure a Russian enterprise
  • Seizure and forfeiture of all Russian assets within the UK and its dominions
  • Motion to remove Russia from the UN Security Council

There are many more things which could and should be done, by January 2023 there will be no Ukraine to defend.

Yours sincerely,
Tim Pizey

Galactic North (a review)

Galactic North

Alastair Reynolds
ISBN-13: ‎ 978-0575083127

Galactic North is a group of short stories set in the Revelation Space universe starting at it’s very beginning and stretching right to it’s end.


Great Wall of Mars


I reread the Great Wall of Mars after the Inhibitor Phase to remind me of some of Warren Clavian’s back story. It didn’t disappoint. I should have read Great Wall of Mars again before Inhibitor Phase, but hindsight is a wonderful thing. At least I’ve remembered why Nevile hated his brother and how he was betrayed by him, why Nevile defected to the conjoiners and how Felka fits in. One small story explains so much of why things happened in several of the other stories including Absolution Gap.


Glacial
 

Glacial adds little to the overall story, but does help to explain how the relationship between Clavian, Galiana and Felka developers and how it becomes so strong. Glacial is really an opportunity for Alastair Reynolds to explore the concept of a thinking, possible sentient planet.  He does this, as always, by hinting throughout at the bigger picture and keeping you reading.


A Spy in Europa

 
Not sure what I think of this one. Seemed a bit pointless. Not very nice characters who all stabbed each other in the back. Some interesting science tho and provided a backdrop and context for Grafenwalder's Bestiary.


Weather

 
What a fantastic standalone story this is with some great characters who demonstrate that not all Ultras are cut-throat. There’s lots more detail about conjoiners here and the secret of how C-drives are managed is revealed, but that’s not the darkest secret.


Dilation Sleep

 
I was disappointed in this story until I read the notes at the end and realised it was the first story written in the revelation space universe and that it introduced some key aspects, such as Chasm City. It doesn’t really add anything to the overall story, but has some interesting insights into refersleep.
 

Grafenwalder's Bestiary

 
Some of the best stories are those which are difficult to read due to the behaviour of some of the characters. When they do things you can’t understand the motivation for and could not imagine doing yourself. In this story it’s cruelty, deception and revenge and I loved it.


Nightingale

I do wonder how Alastair Reynolds thinks up these horrors, but they are glorious. This story is particularly horrible at the end. The evil computer was far worse than anything in the Resident Evil series, with undertones of Hal 9000. There’s exploration in the story, battle, weapons and the sort of intrigue which makes it difficult to put down. 


Galactic North 

This should be expanded to a novel, or at least a novella. There’s scope for so much evolution, especially with the greenfly and how they come to take over. I couldn’t put this down, and wouldn’t have done it if my Kindle hadn’t died a few pages before the end!

In some ways it’s a shame that Alastair Reynolds has put a hard limit on the timeline of Revelation Space, but I loved it! I reread Galactic North to understand the comments at the end of Inhibitor Phase and the Nest Builders. I should have read it first. And I should have a Revelation Space timeline on my wall.


Study of developers for the cost of a phase I clinical drug trial

For many years now, I have been telling people that software researchers need to be more ambitious and apply for multi-million pound/dollar grants to run experiments in software engineering. After all, NASA spends a billion or so sending a probe to take some snaps of a planet and astronomers lobby for $100million funding for a new telescope.

What kind of experimental study might be run for a few million pounds (e.g., the cost of a Phase I clinical drug trial)?

Let’s say that each experiment involves a team of professional developers implementing a software system; call this a Project. We want the Project to be long enough to be realistic, say a week.

Different people exhibit different performance characteristics, and the experimental technique used to handle this is to have multiple teams independently implement the same software system. How many teams are needed? Fifteen ought to be enough, but more is better.

Different software systems contain different components that make implementation easier/harder for those involved. To remove single system bias, a variety of software systems need to be used as Projects. Fifteen distinct Projects would be great, but perhaps we can get away with five.

How many developers are on a team? Agile task estimation data shows that most teams are small, i.e., mostly single person, with two and three people teams making up almost all the rest.

If we have five teams of one person, five of two people, and five of three people, then there are 15 teams and 30 people.

How many people will be needed over all Projects?

15 teams (30 people) each implementing one Project
 5 Projects, which will require 5*30=150 people (5*15=75 teams)

How many person days are likely to be needed?

If a 3-person team takes a week (5 days), a 2-person team will take perhaps 7-8 days. A 1-person team might take 9-10 days.

The 15 teams will consume 5*3*5+5*2*7+5*1*9=190 person days
The  5 Projects will consume              5*190=950 person days

How much is this likely to cost?

The current average daily rate for a contractor in the UK is around £500, giving an expected cost of 190*500=£475,000 to hire the experimental subjects. Venue hire is around £40K (we want members of each team to be co-located).

The above analysis involves subjects implementing one Project. If, say, each subject implements two, three or four Projects, one after the other, the cost is around £2million, i.e., the cost of a Phase I clinical drug trial.

What might we learn from having subjects implement multiple Projects?

Team performance depends on the knowledge and skill of its members, and their ability to work together. Data from these experiments would be the first of their kind, and would provide realistic guidance on performance factors such as: impact of team size; impact of practice; impact of prior experience working together; impact of existing Project experience. The multiple implementations of the same Project created provide a foundation for measuring expected reliability and theories of N-version programming.

A team of 1 developer will take longer to implement a Project than a team of 2, who will take longer than a team of 3.

If 20 working days is taken as the ballpark period over which a group of subjects are hired (i.e., a month), there are six team size sequences that one subject could work (A to F below); where individual elapsed time is close to 20 days (team size 1 is 10 days elapsed, team size 2 is 7.5 days, team size 3 is 5 days).

Team size    A      B      C      D      E      F
    1      twice   once   once  
    2                     once  thrice  once
    3             twice                twice   four

The cost of hiring subjects+venue+equipment+support for such a study is likely to be at least £1,900,000.

If the cost of beta testing, venue hire and research assistants (needed during experimental runs) is included, the cost is close to £2.75 million.

Might it be cheaper and simpler to hire, say, 20-30 staff from a medium size development company? I chose a medium-sized company because we would be able to exert some influence over developer selection and keeping the same developers involved. The profit from 20-30 people for a month is not enough to create much influence within a large company, and a small company would not want to dedicate a large percentage of its staff for a solid month.

Beta testing is needed to validate both the specifications for each Project and that it is possible to schedule individuals to work in a sequence of teams over a month (individual variations in performance create a scheduling nightmare).

On A Generally Fractal Family – student

Recently, my fellow students and I have been caught up in the craze that is sweeping through the users of Professor B------'s clockwork calculating engine; namely the charting of sets of two dimensional points that have fractal planar boundaries, being those that in some sense have a fractional dimension. Of particular interest have been the results of repeated applications of quadratic functions to complex numbers; specifically in measuring how quickly, if at all, they escape a region surrounding the starting point, by which charts may be constructed that many of the collegiate consider so delightful as to constitute art painted by mathematics itself!

User Stories by Example tutorials now Free

My User Stories by Example tutorial series is now free.

There are five tutorials which include video lectures, worked examples with on real user stories and exercises covering user stories basics, acceptance criteria, story splitting, story refactoring and more.

The series is based on my Little Book of Requirements and User Stories – the audio files for the book are there too but you will have to pay for them. If you are an Audible subscriber you can get the book there as part of you subscription. Print book, eBook and audio book are all available at Amazon.


Subscribe and download Continuous Digital for free

The post User Stories by Example tutorials now Free appeared first on Allan Kelly.

Growth in FLOPS used to train ML models

AI (a.k.a. machine learning) is a compute intensive activity, with the performance of trained models being dependent on the quantity of compute used to train the model.

Given the ongoing history of continually increasing compute power, what is the maximum compute power that might be available to train ML models in the coming years?

How might the compute resources used to train an ML model be measured?
One obvious answer is to specify the computers used and the numbers of days used they were occupied training the model. The problem with this approach is that the differences between the computers used can be substantial. How is compute power measured in other domains?

Supercomputers are ranked using FLOPS (floating-point operations per second), or GigFLOPS or PetaFLOPS (10^{15}). The Top500 list gives values for R_{max} (based on benchmark performance, i.e., LINNPACK) and R_{peak} (what the hardware is theoretically capable of, which is sometimes more than twice R_{max}).

A ballpark approach to measuring the FLOPS consumed by an application is to estimate the FLOPS consumed by the computers involved and multiply by the number of seconds each computer was involved in training. The huge assumption made with this calculation is that the application actually consumes all the FLOPS that the hardware is capable of supplying. In some cases this appears to be the metric used to estimate the compute resources used to train an ML model. Some published papers just list a FLOPS value, while others list the number of GPUs used (e.g., 2,128).

A few papers attempt a more refined approach. For instance, the paper describing the GPT-3 models derives its FLOPS values from quantities such as the number of parameters in each model and number of training tokens used. Presumably, the research group built a calibration model that provided the information needed to estimate FLOPS in this way.

How does one get to be able to use PetaFLOPS of compute to train a model (training the GPT-3 175B model consumed 3,640 PetaFLOP days, or around a few days on a top 8 supercomputer)?

Pay what it costs. Money buys cloud compute or bespoke supercomputers (which are more cost-effective for large scale tasks, if you have around £100million to spend plus £10million or so for the annual electricity bill). While the amount paid to train a model might have lots of practical value (e.g., can I afford to train such a model), researchers might not be keen to let everybody know how much they spent. For instance, if a research team have a deal with a major cloud provider to soak up any unused capacity, those involved probably have no interest in calculating compute cost.

How has the compute power used to train ML models increased over time? A recent paper includes data on the training of 493 models, of which 129 include estimated FLOPS, and 106 contain date and model parameter data. The data comes from published papers, and there are many thousands of papers that train ML models. The authors used various notability criteria to select papers, and my take on the selection is that it represents the high-end of compute resources used over time (which is what I’m interested in). While they did a great job of extracting data, there is no real analysis (apart from fitting equations).

The plot below shows the FLOPS training budget used/claimed/estimated for ML models described in papers published on given dates; lines are fitted regression models, and the colors are explained below (code+data):

FLOPS consumed training ML models over time.

My interpretation of the data is based on the economics of accessing compute resources. I see three periods of development:

  1. do-it yourself (18 data points): During this period most model builders only had access to a university computer, desktop machines, or a compute cluster they had self-built,
  2. cloud (74 data points): Huge on demand compute resources are now just a credit card away. Researchers no longer have to wait for congested university computers to become available, or build their own systems.

    AWS launched in 2006, and the above plot shows a distinct increase in compute resources around 2008.

  3. bespoke (14 data points): if the ML training budget is large enough, it becomes cost-effective to build a bespoke system, e.g., a supercomputer. As well as being more cost-effective, a bespoke system can also be specifically designed to handle the characteristics of the kinds of applications run.

    How might models trained using a bespoke system be distinguished from those trained using cloud compute? The plot below shows the number of parameters in each trained model, over time, and there is a distinct gap between 10^{10} and 10^{11} parameters, which I assume is the result of bespoke systems having the memory capacity to handle more parameters (code+data):

    Number of parameters in ML models over time.

The rise in FLOPS growth rate during the Cloud period comes from several sources: 1) the exponential decline in the prices charged by providers delivers researchers an exponentially increasing compute for the same price, 2) researchers obtaining larger grants to work on what is considered to be an important topic, 3) researchers doing deals with providers to make use of excess capacity.

The rate of growth of Cloud usage is capped by the cost of building a bespoke system. The future growth of Cloud training FLOPS will be constrained by the rate at which the prices charged for a FLOP decreases (grants are unlikely to continually increase substantially).

The rate of growth of the Top500 list is probably a good indicator of the rate of growth of bespoke system performance (and this does appear to be slowing down). Perhaps specialist ML training chips will provide performance that exceeds that of the GPU chips currently being used.

The maximum compute that can be used by an application is set by the reliability of the hardware and the percentage of resources used to recover from hard errors that occur during a calculation. Supercomputer users have been facing the possibility of hitting the wall of maximum compute for over a decade. ML training is still a minnow in the supercomputer world, where calculations run for months, rather than a few days.

Migrating source code from RCS to Mercurial

Version control system migrations are a fact of life for developers in any longer lived codebase. In fact, I’ve had a hand in quite a few migrations as newer, more workable version control systems became available. Also, like a lot of developers, I’ve got fragments of source code dating back quite some years floating around on various servers and development machines of mine. Not necessarily code that is still being used, but still code that I don’t want to just delete forever.

The Agile Elephant and the agile mindset

African Elephant

Confession: I’ve been avoiding the words “agile mindset” for some time because I don’t know what it is. And, completely by coincidence, I’ve recently had a couple of encounters that have caused me to think again. So let me explain…

I repeatedly find myself wrestling with the question “What is agile?” The question came up recently in a new form when I was invited to give a talk on “The Agile Mindset.” I appealed for help on LinkedIn. I got some great answers and the diversity of answers confirmed what I though: it is hard to describe “the agile mindset” in a short or generally agreed form.

The first problem is that to explain “the Agile mindset” one first has to agree what agile is, and is not. I have my own view but I know there is a diversity of opinion so I find it useful to describe “Agile” with the story of the blind men examining an elephant: one feels the leg and says “This is a mighty tree”, another feels the tusk and says “It is a strong sword”, another the trunk and says “It is a strong snake” and so on. Each interprets the part they encounter as the whole yet the whole, to one who has never seen an elephant, can be hard to comprehend.

Illustration from the Natural History Museum, London

The same is true for agile.

The literalist looks in the dictionary and says “Agile is about being fast, reactive and responding to the outside”, the engineer looks at agile and says “It is about doing quality work so we may deliver more”, the Scum aficionado says “It is about high performing teams and alignment”, the Lean thinker says “It is about reducing work in progress and simplifying workflows” and the management consultant says “It is about delivering more with less.”

All are right, none is wrong. And while that is a problem in describing what agile is it is also a strength. Agile is multi-faceted and offers “something for everyone.” While different people emphasis different things it also means the whole is more than the sum of the parts. If you can harness high performing teams, with engineering quality, low WIP and reactive processes then you can deliver the fabled faster, better, cheaper.

But that also makes it hard.

It also goes some way to explaining why “Agile Coaches” never agree: each has their own interpretation of how to put those pieces together to make the whole – to change metaphor, everyone approaches the jigsaw differently.

And again that is right because every jigsaw, every application of agile, exists in a unique context and must be faced on its own terms – to quote Tolstoy: “All happy families are the same, all unhappy families are unhappy in their own unique way.” (And long time readers might notice I just contradicted myself.)

And one important reason why the jigsaw is always different is: in completing the last jigsaw, and since completing it, you, and everyone else as learned, the bodies may be the same but the people – and their minds – are different.

Ultimately, I still claim “Agile” is learning, specifically organizational learning: the thesis I laid out in my first book over 10 years ago Changing Software Development.

Hence I say: The only thing you can do wrong in agile is work the same as you did three months ago. To be agile one should always be learning and changing as a result of that learning.

I should explain that some more in another post, and I’ll have more to say about the agile mindset soon.


Subscribe to my blog newsletter and download Continuous Digital for free

The post The Agile Elephant and the agile mindset appeared first on Allan Kelly.

Comparison of Matrix events before and after “Extensible Events”

(Background: Matrix is the awesome open standard for messaging that I get to work on now that I work at Element.)

The Extensible Events (MSC1767) Matrix Spec Change proposal describes a new way of structuring events in matrix that makes it easy to send events that have multiple representations (e.g. something clever like an interactive map, and something simpler like an image of a map).

The main purpose of the change is to make it easy for clients that don’t support some amazing new feature to display something that is still useful.

Since there is an implementation of this change out in the wild (in Element), it seems reasonably likely that this change will be accepted into the Matrix spec.

I really like this change, but I find it hard to understand, so here is a simple example that I have found helpful to think it through.

An old event, and a new event

Here is an old-fashioned event, followed by a new, shiny, extensible version:

{
    "type": "m.room.message",
    "content": {
        "body": "This is the *old* way",
        "format": "org.matrix.custom.html",
        "formatted_body": "This is the <b>old</b> way",
        "msgtype": "m.text"
    },
    ... other properties not relevant to this, e.g. "sender" ...
}
{
    "type": "m.message",
    "content": {
        "m.message": [
            {"mimetype": "text/plain", "body": "This the *new* way"},
            {"mimetype": "text/html", "body": "This is the <b>new</b> way"}
        ],
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

Notice that in the new extensible events, the property within content is the same as the message type (here: m.message).

The point is that as well as the primary event type (here, m.message) we can other representations of the same message, such as an image, location co-ordinates, or something completely different. The client will render the primary event type if it understands it (and is able to show it), but if not, it can look for other types that it does understand.

For example, in Polls when you send a new poll question, it could look like this:

{
    "type": "m.poll.start",
    "content": {
        "m.poll.start": {
            ... The actual poll question etc. ...
        },
        "m.message": [
            ... A text version of the question ...
        ]
    },
    ... other properties not relevant to this, e.g. "sender" ...
}

So clients that don’t know m.poll.start can still display the poll question (if they understand extensible events), instead of completely ignoring event types they don’t know about.

An abbreviated form of the new event

Of course, life is not quite as simple as that.

Because this is a lot of typing:

{
    "type": "m.message",
    "content": {
        "m.message": [
            {"mimetype": "text/plain", "body": "This the *new* way"},
            {"mimetype": "text/html", "body": "This is the <b>new</b> way"}
        ],
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

We have an abbreviated form:

{
    "type": "m.message",
    "content": {
        "m.text": "This the *new* way",
        "m.html": "This is the <b>new</b> way"
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

These two are exactly equivalent.

m.text is an abbreviation for an m.message containing an entry with "mimetype": "text/plain" and the relevant body. Similarly, m.html is an abbreviation for an m.message containing an entry with "mimetype": "text/html" and the relevant body. If you declare both, they effectively get squashed together into one m.message with both entries.

Those 2 are the only abbreviations listed, so they are special cases.

Backwards compatibility

Of course, life is way more complicated than that, so what we’re likely to see around if/when this gets widely adopted is some kind of mashed-together event like this:

{
    "type": "m.room.message",
    "content": {
        "msgtype": "m.text",
        "body": "Hello World",
        "format": "org.matrix.custom.html",
        "formatted_body": "<b>Hello</b> World",
        "m.text": "Hello World",
        "m.html": "<b>Hello</b> World"
    }
}

Note that the type here is m.room.message, where extensible events says it should be m.message. The idea is that an extensible-events-aware client will see "msgtype": "m.text" and know to look for m.message as the primary type. (This is further complicated here by the fact that there isn’t actually a m.message property – this is because m.text and m.html are abbreviated forms of it.)

Also, clients that want to display old events will need to preserve their code that parses the old event types in perpetuity.

Cost-effectiveness decision for fixing a known coding mistake

If a mistake is spotted in the source code of a shipping software system, is it more cost-effective to fix the mistake, or to wait for a customer to report a fault whose root cause turns out to be that particular coding mistake?

The naive answer is don’t wait for a customer fault report, based on the following simplistic argument: C_{fix} < C_{find}+C_{fix}.

where: C_{fix} is the cost of fixing the mistake in the code (including testing etc), and C_{find} is the cost of finding the mistake in the code based on a customer fault report (i.e., the sum on the right is the total cost of fixing a fault reported by a customer).

If the mistake is spotted in the code for ‘free’, then C_{find}==0, e.g., a developer reading the code for another reason, or flagged by a static analysis tool.

This answer is naive because it fails to take into account the possibility that the code containing the mistake is deleted/modified before any customers experience a fault caused by the mistake; let M_{gone} be the likelihood that the coding mistake ceases to exist in the next unit of time.

The more often the software is used, the more likely a fault experience based on the coding mistake occurs; let F_{experience} be the likelihood that a fault is reported in the next time unit.

A more realistic analysis takes into account both the likelihood of the coding mistake disappearing and a corresponding fault being reported, modifying the relationship to: C_{fix} < (C_{find}+C_{fix})*{F_{experience}/M_{gone}}

Software systems are eventually retired from service; the likelihood that the software is maintained during the next unit of time, S_{maintained}, is slightly less than one.

Giving the relationship: C_{fix} < (C_{find}+C_{fix})*{F_{experience}/M_{gone}}*S_{maintained}

which simplifies to: 1 < (C_{find}/C_{fix}+1)*{F_{experience}/M_{gone}}*S_{maintained}

What is the likely range of values for the ratio: C_{find}/C_{fix}?

I have no find/fix cost data, although detailed total time is available, i.e., find+fix time (with time probably being a good proxy for cost). My personal experience of find often taking a lot longer than fix probably suffers from survival of memorable cases; I can think of cases where the opposite was true.

The two values in the ratio F_{experience}/M_{gone} are likely to change as a system evolves, e.g., high code turnover during early releases that slows as the system matures. The value of F_{experience} should decrease over time, but increase with a large influx of new users.

A study by Penta, Cerulo and Aversano investigated the lifetime of coding mistakes (detected by several tools), tracking them over three years from creation to possible removal (either fixed because of a fault report, or simply a change to the code).

Of the 2,388 coding mistakes detected in code developed over 3-years, 41 were removed as reported faults and 416 disappeared through changes to the code: F_{experience}/M_{gone} = 41/416 = 0.1

The plot below shows the survival curve for memory related coding mistakes detected in Samba, based on reported faults (red) and all other changes to the code (blue/green, code+data):

Survival curves of coding mistakes in Samba.

Coding mistakes are obviously being removed much more rapidly due to changes to the source, compared to customer fault reports.

For it to be cost-effective to fix coding mistakes in Samba, flagged by the tools used in this study (S_{maintained} is essentially one), requires: 10 < C_{find}/C_{fix}+1.

Meeting this requirement does not look that implausible to me, but obviously data is needed.

Time For A Chi Test – a.k.

A few months ago we explored the chi-squared distribution which describes the properties of sums of squares of standard normally distributed random variables, being those that have means of zero and standard deviations of one.
Whilst I'm very much of the opinion that statistical distributions are worth describing in their own right, the chi-squared distribution plays a pivotal role in testing whether or not the categories into which a set of observations of some variable quantity fall are consistent with assumptions about the expected numbers in each category, which we shall take a look at in this post.

Software engineering research is a field of dots

Software engineering research is a field of dots; people are fully focused on publishing papers about their chosen tiny little subject.

Where are the books joining the dots into even a vague outline?

Several software researchers have told me that writing books is not a worthwhile investment of their time, i.e., the number of citations they are likely to attract makes writing papers the only cost-effective medium (books containing an edited collection of papers continue to be published).

Butterfly collecting has become the method of study for many researchers. The butterflies in question often being Github repos that are collected together, based on some ‘interestingness’ metric, and then compared and contrasted in a conference paper.

The dots being collected are influenced by the problems that granting agencies consider to be important topics to fund (picking a research problem that will attract funding is a major consideration for any researcher). Fake research is one consequence of incentivizing people to use particular techniques in their research.

Whatever you think the aims of research in software engineering might be, funding the random collecting of dots does not seem like an effective strategy.

Perhaps it is just a matter of waiting for the field to grow up. Evidence-based software engineering research is still a teenager, and the novelty of butterfly collecting has yet to wear off.

My study of particular kinds of dots did not reveal many higher level patterns, although a number of folk theories were shown to be unfounded.

A review: Inhibitor Phase by Alastair Reynolds

Inhibitor Phase
by Alastair Reynolds
 
 ISBN-13 ‏ : ‎ 978-0316462761

* * * Warning Spoilers * * *


To say I was excited at the prospect of another core Revelation Space novel, more than a decade since Absolutely Gap, wouldn’t come close. In preparation I reread Absolution Gap and loved it on the second reading.

I wasn’t inspired by the description of the Miguel character hiding from the Wolves on an unknown planet, but it turns out this was just a minor distraction at the beginning and that The Inhibitor phase plays a major part in advancing the story. The scope and breadth, as you would expect from Alistair Reynolds is vast and intricate.

I was a little disappointed that the characters were ping ponging between some of the same old worlds, Ararat and Yellowstone, and the evolution of some of the survivors from Redemption Ark into Merpeople, but this didn’t detract in any way. It either wasn’t clear or I missed what happened to Ana Khouri - maybe she’s still on Hela. It was sad, but probably necessary to see the end to the Nostalgia for Infinity. I also missed how, following her death on Mars, Glass and Warren had reencountered each other and swum with the pattern Jugglers prior to Sun Hollow.

Overall I loved this story. I literally could not put it down! Alastair Reynolds is the master of descriptive exploration and constantly hints at more facets to the story I just have to know and have to keep reading for! The Inhibitor Phase does of course leave questions unanswered and sets up the next story, which I cannot wait for either!
 

How can I pin dependent packages when using use-package?

I’ve been trying to up my use-package game recently and converted my hand rolled package check and installer to use-package. I usually prefer to use packages from melpa-stable so I pin the default package source used by use-package to melpa-stable and override it where necessary That’s working well in general and looks something like this: (setq use-package-always-pin "melpa-stable") (use-package js2-mode :ensure t :defer t :custom (progn (js-indent-level 2) (js2-include-node-externs t))) (use-package kotlin-mode :ensure t :pin melpa) So in other words, if I’m on a machine that doesn’t have js2-mode and kotlin-mode installed, use-package will install js2-mode from melpa-stable and kotlin-mode from melpa.

Estimation experiments: specification wording is mostly irrelevant

Existing software effort estimation datasets provide information about estimates made within particular development environments and with particular aims. Experiments provide a mechanism for obtaining information about estimates made under conditions of the experimenters choice, at least in theory.

Writing the code is sometimes the least time-consuming part of implementing a requirement. At hackathons, my default estimate for almost any non-trivial requirement is a couple of hours, because my implementation strategy is to find the relevant library or package and write some glue code around it. In a heavily bureaucratic organization, the coding time might be a rounding error in the time taken up by meeting, documentation and testing; so a couple of months would be considered normal.

If we concentrate on the time taken to implement the requirements in code, then estimation time and implementation time will depend on prior experience. I know that I can implement a lexer for a programming language in half-a-day, because I have done it so many times before; other people take a lot longer because they have not had the amount of practice I have had on this one task. I’m sure there are lots of tasks that would take me many days, but there is somebody who can implement them in half-a-day (because they have had lots of practice).

Given the possibility of a large variation in actual implementation times, large variations in estimates should not be surprising. Does the possibility of large variability in subject responses mean that estimation experiments have little value?

I think that estimation experiments can provide interesting information, as long as we drop the pretence that the answers given by subjects have any causal connection to the wording that appears in the task specifications they are asked to estimate.

If we assume that none of the subjects is sufficiently expert in any of the experimental tasks specified to realistically give a plausible answer, then answers must be driven by non-specification issues, e.g., the answer the client wants to hear, a value that is defensible, a round number.

A study by Lucas Gren and Richard Berntsson Svensson asked subjects to estimate the total implementation time of a list of tasks. I usually ignore software engineering experiments that use student subjects (this study eventually included professional developers), but treating the experiment as one involving social processes, rather than technical software know-how, makes subject software experience a lot less relevant.

Assume, dear reader, that you took part in this experiment, saw a list of requirements that sounded plausible, and were then asked to estimate implementation time in weeks. What estimate would you give? I would have thrown my hands up in frustration and might have answered 0.1 weeks (i.e., a few hours). I expected the most common answer to be 4 weeks (the number of weeks in a month), but it turned out to be 5 (a very ‘attractive’ round number), for student subjects (code+data).

The professional subjects appeared to be from large organizations, who I assume are used to implementations including plenty of bureaucratic stuff, as well as coding. The task specification did not include enough detailed information to create an accurate estimate, so subjects either assumed their own work environment or played along with the fresh-faced, keen experimenter (sorry Lucas). The professionals showed greater agreement in that the range of value given was not as wide as students, but it had a more uniform distribution (with maximums, rather than peaks, at 4 and 7); see below. I suspect that answers at the high end were from managers and designers with minimal coding experience.

What did the experimenters choose weeks as the unit of estimation? Perhaps they thought this expressed a reasonable implementation time (it probably is if it’s not possible to use somebody else’s library/package). I think that they could have chosen day units and gotten essentially the same results (at least for student subjects). If they had chosen hours as the estimation unit, the spread of answers would have been wider, and I’m not sure whether to bet on 7 (hours in a working day) or 10 being the most common choice.

Fitting a regression model to the student data shows estimates increasing by 0.4 weeks per year of degree progression. I was initially baffled by this, and then I realized that more experienced students expect to be given tougher problems to solve, i.e., this increase is based on self-image (code+data).

The stated hypothesis investigated by the study involved none of the above. Rather, the intent was to measure the impact of obsolete requirements on estimates. Subjects were randomly divided into three groups, with each seeing and estimating one specification. One specification contained four tasks (A), one contained five tasks (B), and one contained the same tasks as (A) plus an additional task followed by the sentence: “Please note that R5 should NOT be implemented” (C).

A regression model shows that for students and professions the estimate for (A) is about 1-2 weeks lower than (B), while (A) estimates are 3-5 weeks lower than (C) estimated.

What are subjects to make of an experimental situation where the specification includes a task that they are explicitly told to ignore?

How would you react? My first thought was that the ignore R5 sentence was itself ignored, either accidentally or on purpose. But my main thought is that Relevance theory is a complicated subject, and we are a very long way away from applying it to estimation experiments containing supposedly redundant information.

The plot below shows the number of subjects making a given estimate, in days; exp0to2 were student subjects (dashed line joins estimate that include a half-hour value, solid line whole hour), exp3 MSc students, and exp4 professional developers (code+data):

Number of subjects making a given estimate.

I hope that the authors of this study run more experiments, ideally working on the assumption that there is no connection between specification and estimate (apart from trivial examples).

Pitfall – baron m.

Greetings Sir R-----! Come warm yourself by the hearth and take a dram of scotch!

Would you care for a wager to fire up your blood?

Stout fellow!

I propose a game that puts me in mind of an ill-fated caving expedition that I undertook some several years ago.

Another quick Isso setup tweak

While I was implementing a few more changes on my web server - mostly adding the sorely needed blacklistd configuration for sshd - I noticed that NGINX’s log was showing occasional errors when trying to contact the Isso process. They all had one thing in common, namely that they were all trying to contact ISSO via IPV6 as the server has both stacks enabled. Turns out that isso only listens on an IPV4 socket and I could not find an obvious way to get it to listen on both.

Visual Lint 8.0.8.351 has been released

Visual Lint 8.0.8.351 has now been released.

This is a maintenance update for Visual Lint 8.0, and includes the following changes:

  • Fixed a bug in the handling of preprocessor symbol properties in Visual Studio projects.

  • VisualLintGui will now open files dropped on its main window.

  • Updated the PC-lint Plus compiler indirect files co-rb-vs2019.lnt and co-rb-vs2022.lnt to filter out errors in <xutility> when analysing some Visual Studio 2019 and 2022 projects.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2022.lnt to support Visual Studio 2022 v17.0.5.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2019.lnt to support Visual Studio 2019 v16.11.9.

  • Minor updates to the online help.

Download Visual Lint 8.0.8.351

semgrep: the future of static analysis tools

When searching for a pattern that might be present in source code contained in multiple files, what is the best tool to use?

The obvious answer is grep, and grep is great for character-based pattern searches. But patterns that are token based, or include information on language semantics, fall outside grep‘s model of pattern recognition (which does not stop people trying to cobble something together, perhaps with the help of complicated sed scripts).

Those searching source code written in C have the luxury of being able to use Coccinelle, an industrial strength C language aware pattern matching tool. It is widely used by the Linux kernel maintainers and people researching complicated source code patterns.

Over the 15+ years that Coccinelle has been available, there has been a lot of talk about supporting other languages, but nothing ever materialized.

About six months ago, I noticed semgrep and thought it interesting enough to add to my list of tool bookmarks. Then, a few days ago, I read a brief blog post that was interesting enough for me to check out other posts at that site, and this one by Yoann Padioleau really caught my attention. Yoann worked on Coccinelle, and we had an interesting email exchange some 13-years ago, when I was analyzing if-statement usage, and had subsequently worked on various static analysis tools, and was now working on semgrep. Most static analysis tools are created by somebody spending a year or so working on the implementation, making all the usual mistakes, before abandoning it to go off and do other things. High quality tools come from people with experience, who have invested lots of time learning their trade.

The documentation contains lots of examples, and working on the assumption that things would be a lot like using Coccinelle, I jumped straight in.

The pattern I choose to search for, using semgrep, involved counting the number of clauses contained in Python if-statement conditionals, e.g., the condition in: if a==1 and b==2: contains two clauses (i.e., a==1, b==2). My interest in this usage comes from ideas about if-statement nesting depth and clause complexity. The intended use case of semgrep is security researchers checking for vulnerabilities in code, but I’m sure those developing it are happy for source code researchers to use it.

As always, I first tried building the source on the Github repo, (note: the Makefile expects a git clone install, not an unzipped directory), but got fed up with having to incrementally discover and install lots of dependencies (like Coccinelle, the code is written on OCaml {93k+ lines} and Python {13k+ lines}). I joined the unwashed masses and used pip install.

The pattern rules have a yaml structure, specifying the rule name, language(s), message to output when a match is found, and the pattern to search for.

After sorting out various finger problems, writing C rather than Python, and misunderstanding the semgrep output (some of which feels like internal developer output, rather than tool user developer output), I had a set of working patterns.

The following two patterns match if-statements containing a single clause (if.subexpr-1), and two clauses (if.subexpr-2). The option commutative_boolop is set to true to allow the matching process to treat Python’s or/and as commutative, which they are not, but it reduces the number of rules that need to be written to handle all the cases when ordering of these operators is not relevant (rules+test).

rules:
- id: if.subexpr-1
  languages: [python]
  message: if-cond1
  patterns:
   - pattern: |
      if $COND1:  # we found an if statement
         $BODY
   - pattern-not: |
      if $COND2 or $COND3: # must not contain more than one condition
         $BODY
   - pattern-not: |
      if $COND2 and $COND3:
         $BODY
  severity: INFO

- id: if.subexpr-2
  languages: [python]
  options:
   commutative_boolop: true # Reduce combinatorial explosion of rules
  message: if-cond2
  pattern-either:
   - patterns:
      - pattern: |
         if $COND1 or $COND2: # if statement containing two conditions
            $BODY
      - pattern-not: |
         if $COND3 or $COND4 or $COND5: # must not contain more than two conditions
            $BODY
      - pattern-not: |
         if $COND3 or $COND4 and $COND5:
            $BODY
   - patterns:
      - pattern: |
         if $COND1 and $COND2:
            $BODY
      - pattern-not: |
         if $COND3 and $COND4 and $COND5:
            $BODY
      - pattern-not: |
         if $COND3 and $COND4 or $COND5:
            $BODY
  severity: INFO

The rules would be simpler if it were possible for a pattern to not be applied to code that earlier matched another pattern (in my example, one containing more clauses). This functionality is supported by Coccinelle, and I’m sure it will eventually appear in semgrep.

This tool has lots of rough edges, and is still rapidly evolving, I’m using version 0.82, released four days ago. What’s exciting is the support for multiple languages (ten are listed, with experimental support for twelve more, and three in beta). Roughly what happens is that source code is mapped to an abstract syntax tree that is common to all supported languages, which is then pattern matched. Supporting a new language involves writing code to perform the mapping to this common AST.

It’s not too difficult to map different languages to a common AST that contains just tokens, e.g., identifiers and their spelling, literals and their value, and keywords. Many languages use the same operator precedence and associativity as C, plus their own extras, and they tend to share the same kinds of statements; however, declarations can be very diverse, which makes life difficult for supporting a generic AST.

An awful lot of useful things can be done with a tool that is aware of expression/statement syntax and matches at the token level. More refined semantic information (e.g., a variable’s type) can be added in later versions. The extent to which an investment is made to support the various subtleties of a particular language will depend on its economic importance to those involved in supporting semgrep (Return to Corp is a VC backed company).

Outside of a few languages that have established tools doing deep semantic analysis (i.e., C and C++), semgrep has the potential to become the go-to static analysis tool for source code. It will benefit from the network effects of contributions from lots of people each working in one or more languages, taking their semgrep skills and rules from one project to another (with source code language ceasing to be a major issue). Developers using niche languages with poor or no static analysis tool support will add semgrep support for their language because it will be the lowest cost path to accessing an industrial strength tool.

How are the VC backers going to make money from funding the semgrep team? The traditional financial exit for static analysis companies is selling to a much larger company. Why would a large company buy them, when they could just fork the code (other company sales have involved closed-source tools)? Perhaps those involved think they can make money by selling services (assuming semgrep becomes the go-to tool). I have a terrible track record for making business predictions, so I will stick to the technical stuff.

The difficulties of cascading OKRs

I almost despair when I hear people advocate cascading OKRs: the idea that someone, some team, some central planning department, can set OKRs which then flow down the organization with each “lower” group implementing some small part of some “higher” ask. What could be more waterfall like?

I admit, when I started working with OKRs I kind-of-expected to be shown the OKRs of the “above” before my team wrote theirs. But when I thought about it, and the more I thought about it, the more I realised if you did do it that way then it is decided unAgile. How can a team be really autonomous, self-organising and self-managing if they have goals handed down to them?

There was a point when I was wracked with self-doubt: am I interpretting OKRs differently to the rest of the world? How do I reconcile agile and cascading OKRs? What am I missing? – but, when you look around, I am not the only one. In fact, if you read, watch and listen to OKR commentators the majority agree with me: the teams delivering OKRs need the latitude to set their own OKRs.

Reconciling OKRs with agile is far from the biggest problem. In fact there are, at least, two bigger problems, one concerns team motivation. Can a team ever be motivated to do something they have no say in? Perhaps some can, I can’t and I know others who don’t. At the very least team members need to be asked.

Motivation becomes especially problematic if you want OKRs to be stretching. If you set someone a stretching goal and ask them to hit it without involving them then don’t be surprised if they shrug their shoulders.

Still, we haven’t got to the biggest problem.

The biggest problem with OKRs is not the metaphysical issues of motivation and whether one is truly agile or not. The biggest difficulty is simply: cascading OKRs are not practical.

First think about the timetable.

If every team is waiting for the team above them to issue OKRs before they set their own then you have a delay built into the system. And the more levels of hierarchy you have the greater the delay is going to be.

For example, suppose you have an executive team, and middle management team and several delivery teams. Then each cycle the exec team need to set some OKRs, once they have set their the middle management can set theirs, and then the delivery teams can set theirs. At each cascade point there needs to be communication, and each point creates the possibility of misunderstanding and mistakes.

Setting OKRs isn’t instantaneous, I think you need about a week to have a think, reflect overnight, iterate once or twice but, if you are well practices, and don’t hit any delays, you might do it in two days. Either way it is going to take at least a week, and possibly three, to get all three layers set. And if anyone runs late then it has a knock on effect.

I’ve heard it said that the Key Results of higher levels become the objectives of the next layer down. The key results of this layer the become the objectives of the one below them. But that assume that the OKRs themselves are a series of “items to do” and that each objective is made up of several pieces which are themselves things to do.

Sure, it sometimes happens that way. I may even have been guilty of interpreting them that way sometimes. But these days I see Key Results not as small pieces of work which, lego style, build into a bigger objective but as Acceptance Criteria: the parameters which the outcome needs to satisfy.

Now to some degree acceptance criteria can be translated into work items to do, and vice versa, but not always. Consider this:

Objective: Improve overnight batch processing to save 10% of work processing costs
Key result #1: Shorten batch processing time by 1 hour so staff do not need to wait for run to complete in the morning
Key result #2: Reduce false positive alerts by 100 per day so that staff waste less time

Now these key results could be packaged as individual work to do but perhaps they are the same piece or work. Perhaps a database upgrade could address both issues in one go. Which path you take is a design decision.

Seeing key results as acceptance criteria changes them from work to do into bounding conditions.

In Succeeding with OKRs in Agile I advise against having domino key results: don’t set key results so that failing to hit one makes others impossible to hit. So, for example, if the DB upgrade had been added to that previous example as key result #1 then the team would have been committed to doing it. And if the upgrade had failed then the other key results would have been lost. Leaving it out gives the team the decision on how to proceed: the people doing the work decide the best way of meeting the objective.

That advice is given within teams but it also applies between teams. If, the Middle Management team require three lesser teams to deliver work to build their own objective then, if any one team fail the middle management team will not only miss one key result but will therefore miss their objective.

Done like this the OKRs become fragile and a dependency nightmare. That will have two effects, first more time will be needed when setting OKRs to identify and mitigate the dependencies, then more time will be needed to manage the dependencies. Progress will only occur at the speed of the slowest.

Second, these problems will encourage people to play it safe and not set stretching and ambitious OKRs. Predictability and safety will be prioritised.

Now if we take the alternative approach and each team sets its OKRs independently then the time lag is removed, teams set OKRs in parallel and if someone is late it doesn’t matter. Dependencies may still exist but they have not been baked into the OKRs so teams can put effort into removing dependencies (reducing coupling and increasing cohesion) rather than putting that energy into managing the dependencies.

So, while we might argue about whether OKRs should, or should not, cascade down; and while we might argue about the psychological effects of being given an OKR by another, simply remember: cascading OKRs mean setting OKRs is going to be more complicated and take longer.

Photo by Alexander Hipp on Unsplash


Subscribe to my blog newsletter and download Continuous Digital for free

The post The difficulties of cascading OKRs appeared first on Allan Kelly.

Finding patterns in construction project drawing creation dates

I took part in Projecting Success‘s 13th hackathon last Thursday and Friday, at CodeNode (host to many weekend hackathons and meetups); around 200 people turned up for the first day. Team Designing-Success included Imogen, Ryan, Dillan, Mo, Zeshan (all building construction domain experts) and yours truly (a data analysis monkey who knows nothing about construction).

One of the challenges came with lots of real multi-million pound building construction project data (two csv files containing 60K+ rows and one containing 15K+ rows), provided by SISK. The data contained information on project construction drawings and RFIs (request for information) from 97 projects.

The construction industry is years ahead of the software industry in terms of collecting data, in that lots of companies actually collect data (for some, accumulate might be a better description) rather than not collecting/accumulating data. While they have data, they don’t seem to be making good use of it (so I am told).

Nearly all the discussions I have had with domain experts about the patterns found in their data have been iterative, brief email exchanges, sometimes running over many months. In this hack, everybody involved is sitting around the same table for two days, i.e., the conversation is happening in real-time and there is a cut-off time for delivery of results.

I got the impression that my fellow team-mates were new to this kind of data analysis, which is my usual experience when discussing patterns recently found in data. My standard approach is to start highlighting visual patterns present in the data (e.g., plot foo against bar), and hope that somebody says “That’s interesting” or suggests potentially more interesting items to plot.

After several dead-end iterations (i.e., plots that failed to invoke a “that’s interesting” response), drawings created per day against project duration (as a percentage of known duration) turned out to be of great interest to the domain experts.

Building construction uses a waterfall process; all the drawings (i.e., a kind of detailed requirements) are supposed to be created at the beginning of the project.

Hmm, many individual project drawing plots were showing quite a few drawings being created close to the end of the project. How could this be? It turns out that there are lots of different reasons for creating a drawing (74 reasons in the data), and that it is to be expected that some kinds of drawings are likely to be created late in the day, e.g., specific landscaping details. The 74 reasons were mapped to three drawing categories (As built, Construction, and Design Development), then project drawings were recounted and plotted in three colors (see below).

The domain experts (i.e., everybody except me) enjoyed themselves interpreting these plots. I nodded sagely, and occasionally blew my cover by asking about an acronym that everybody in the construction obviously knew.

The project meta-data includes a measure of project performance (a value between one and five, derived from profitability and other confidential values) and type of business contract (a value between one and four). The data from the 97 projects was combined by performance and contract to give 20 aggregated plots. The evolution of the number of drawings created per day might vary by contract, and the hypothesis was that projects at different performance levels would exhibit undesirable patterns in the evolution of the number of drawings created.

The plots below contain patterns in the quantity of drawings created by percentage of project completion, that are: (left) considered a good project for contract type 1 (level 5 are best performing projects), and (right) considered a bad project for contract type 1 (level 1 is the worst performing project). Contact the domain experts for details (code+data):

Number of drawings created at percentage project completion times.

The path to the above plot is a common one: discover an interesting pattern in data, notice that something does not look right, use domain knowledge to refine the data analysis (e.g., kinds of drawing or contract), rinse and repeat.

My particular interest is using data to understand software engineering processes. How do these patterns in construction drawings compare with patterns in the software project equivalents, e.g., detailed requirements?

I am not aware of any detailed public data on requirements produced using a waterfall process. So the answer is, I don’t know; but the rationales I heard for the various kinds of drawings sound as-if they would have equivalents in the software requirements world.

What about the other data provided by the challenge sponsor?

I plotted various quantities for the RFI data, but there wasn’t any “that’s interesting” response from the domain experts. Perhaps the genius behind the plot ideas will be recognized later, or perhaps one of the domain experts will suddenly realize what patterns should be present in RFI data on high performance projects (nobody is allowed to consider the possibility that the data has no practical use). It can take time for the consequences of data analysis to sink in, or for new ideas to surface, which is why I am happy for analysis conversations to stretch out over time. Our presentation deck included some RFI plots because there was RFI data in the challenge.

What is the software equivalent of construction RFIs? Perhaps issues in a tracking system, or Jira tickets? I did not think to talk more about RFIs with the domain experts.

How did team Designing-Success do?

In most hackathons, the teams that stay the course present at the end of the hack. For these ProjectHacks, submission deadline is the following day; the judging is all done later, electronically, based on the submitted slide deck and video presentation. The end of this hack was something of an anti-climax.

Did team Designing-Success discover anything of practical use?

I think that finding patterns in the drawing data converted the domain experts from a theoretical to a practical understanding that it was possible to extract interesting patterns from construction data. They each said that they planned to attend the next hack (in about four months), and I suggested that they try to bring some of their own data.

Can these drawing creation patterns be used to help monitor project performance, as it progressed? The domain experts thought so. I suspect that the users of these patterns will be those not closely associated with a project (those close to a project are usually well aware of that fact that things are not going well).

A Clash of Kings a Review


A Clash of Kings: Book 2 (A Song of Ice and Fire)

George R.R. Martin

ISBN-13 ‏ : ‎ 978-0007447831

I loved the Game of Thrones TV series Even the way the final series ends. Although I’m not sure I would have chosen the eventual king. I was looking forward to reading the books and understanding the stories in more depth and, to an extent, that was the case. More so with the first book than the second.

A Clash of Kings just has too much irrelevant detail and quickly becomes laborious to read. A part which stands out is after one of the battles where there are many pages given over to a list of knights who were awarded honours. The vast majority were in no way relevant to the story and just prolonged getting to the end. Fortunately the last 3% (I was reading on kindle) was given over to an appendix so I was able to skim that.

There were a number of key events from the TV series, not least of which Bron lighting the wildfire with an arrow, which I was looking out for and were disappointingly missing. Of course the book is the original and these events were invented for the TV series, but still.

I’m told the books get better from the third one onwards, so once I’ve got through Inhibitor Phase, Dune and one or two others I’ll be back. I’ve started now, so I need to finish.

A Jolly Student’s Tea Party – a.k.

Last time we took a look at the chi-squared distribution which describes the behaviour of sums of squares of standard normally distributed random variables, having means of zero and standard deviations of one.
Tangentially related is Student's t-distribution which governs the deviation of means of sets of independent observations of a normally distributed random variable from its known true mean, which we shall examine in this post.

Class war in the modern workplace

Writing about “The knowledge worker” in The Age of Discontinuity (1968) Peter Drucker offers this insight: the knowledge worker sees themselves as a skilled professional, or “white collar” to use an old term. To this end programmers, marketeers and conference producers see themselves as akin to doctors and lawyers.

But, Drucker says, these professionals are still employees and are seen by their employers as “workers” more akin to the “blue collar” factory employees of years gone by. I’ve long found it ironic that many contractors like to see themselves as entrepreneurs and small businesses while their clients may well see them more as casual day-labourers who can be hired and disposed of with little thought.

Quote from Peter Drucker book
Age of Discontinuity “The

Look at it like this: once upon a time the majority of employees would be working on the factory floor, their hands would get dirty. The work of the manger was to optimise both the work they were doing and the way the work was done, employees were probably not encouraged to think too much.

Today is the age of mass knowledge work – at least in developed western countries. The majority of employees type at keyboards and spend all day talking to others about ideas. Their hands stay clean.

Look at the qualifications people hold: in the past blue-collar workers might have a skill, many were “unskilled” or “semi skilled. Workers “trade” might be learned via an apprenticeship which involved observing and doing the work. Today we live in the age of mass knowledge work, the bulk of the workforce are not factory workers or dock labourers but educated analysts, programmers, accountants and such.

Modern blue-collar workers are overwhelmingly degree educated and do expect to have a say in their work – both what is done and how it is done.

When I visit clients I sometimes see – or rather hear – this explicitly when managerial types talk about “the factory floor” or “engine room” when they mean the offices where IT staff work. You see it too when teams are treated as “feature factories” and measured on how many “user stories” or “story points” they complete and how fast the burn-down chart burns down.

There is a mismatch in the way workers view themselves and the way the managers view the workers. This mismatch in views becomes a misunderstanding and can escalate into conflict.

Workers advocate replacing management with “self managing teams” and managers look to replace programmers with with automatic programming, “programming through pictures” or lately “lo code” solutions. Rather than resolving the conflict it becomes an existential fight. Sometimes it can even look like a modern class struggle between workers and employers.

There are no silver bullets to this conflict. Both sides needs to respect one another and we need to find a new understanding of roles.

One of my favourite takes on this dilemma comes from Tim O’Reilly. In an essay entitled “Managing the Bots That Are Managing the Business” (2016) he suggests that the true workers of today are the machines: it is the factory robots that assemble cars, take our online orders and increasingly do all the “heavy lifting”. The managers of these machines are the people who instruct the machine what to do and how to do it. Most obviously programmers but also the many who work use digital technology to instruct a machine, e.g. a marketeer who schedules tweets, or customer service agent who scripts the chat-bot.

Either way you look at it the fact is that modern blue collar workers need more of the skills and knowledge traditionally reserved for managers: they need to understand the profit and loss, plus they need to understand business goals and strategy and appreciate the consequences of their actions. And it means the white-collar managers of the managers need to respect the knowledge workers as peers rather than hired help, they need to be explain business goals, strategy and be open about the business.

It sometimes feels as if the “class war” of the early twentieth century is still with us. Only now it is not blue collar workers fighting white collar but white collar workers fighting between themselves.


Subscribe to my blog newsletter and download Continuous Digital for free

The post Class war in the modern workplace appeared first on Allan Kelly.

Moore’s law was a socially constructed project

Moore’s law was a socially constructed project that depended on the coordinated actions of many independent companies and groups of individuals to last for as long it did.

All products evolve, but what was it about Moore’s law that enabled microelectronics to evolve so much faster and for longer than most other products?

Moore’s observation, made in 1965 based on four data points, was that the number of components contained in a fabricated silicon device doubles every year. The paper didn’t make this claim in words, but a line fitted to four yearly data points (starting in 1962) suggested this behavior continuing into the mid-1970s. The introduction of IBM’s Personal Computer, in 1981 containing Intel’s 8088 processor, led to interested parties coming together to create a hugely profitable ecosystem that depended on the continuance of Moore’s law.

The plot below shows Moore’s four points (red) and fitted regression model (green line). In practice, since 1970, fitting a regression model (purple line) to the number of transistors in various microprocessors (blue/green, data from Wikipedia), finds that the number of transistors doubled every two years (code+data):

Transistors contained in a device over time, plus Moore's original four data-points.

In the early days, designing a device was mostly a manual operation; that is, the circuit design and logic design down to the transistor level were hand-drawn. This meant that creating a device containing twice as many transistors required twice as many engineers. At some point the doubling process either becomes uneconomic or it takes forever to get anything done because of the coordination effort.

The problem of needing an exponentially-growing number of engineers was solved by creating electronic design automation tools (EDA), starting in the 1980s, with successive generations of tools handling ever higher levels of abstraction, and human designers focusing on the upper levels.

The use of EDA provides a benefit to manufacturers (who can design differentiated products) and to customers (e.g., products containing more functionality).

If EDA had not solved the problem of exponential growth in engineers, Moore’s law would have maxed-out in the early 1980s, with around 150K transistors per device. However, this would not have stopped the ongoing shrinking of transistors; two economic factors independently incentivize the creation of ever smaller transistors.

When wafer fabrication technology improvements make it possible to double the number of transistors on a silicon wafer, then around twice as many devices can be produced (assuming unchanged number of transistors per device, and other technical details). The wafer fabrication cost is greater (second row in table below), but a lot less than twice as much, so the manufacturing cost per device is much lower (third row in table).

The doubling of transistors primarily provides a manufacturer benefit.

The following table gives estimates for various chip foundry economic factors, in dollars (taken from the report: AI Chips: What They Are and Why They Matter). Node, expressed in nanometers, used to directly correspond to the length of a particular feature created during the fabrication process; these days it does not correspond to the size of any specific feature and is essentially just a name applied to a particular generation of chips.

Node (nm)                       90      65     40     28      20    16/12     10       7       5
Foundry sale price per wafer  1,650   1,937  2,274  2,891   3,677   3,984   5,992   9,346  16,988
Foundry sale price per chip   2,433   1,428    713    453     399     331     274     233     238
Mass production year          2004    2006   2009   2011    2014    2015    2017    2018   2020
Quarter                        Q4      Q4     Q1     Q4      Q3      Q3      Q2      Q3     Q1
Capital investment per wafer  4,649   5,456  6,404  8,144  10,356  11,220  13,169  14,267  16,746
processed per year
Capital consumed per wafer      411     483    567    721     917     993   1,494   2,330   4,235
processed in 2020
Other costs and markup        1,293   1,454  1,707  2,171   2,760   2,990   4,498   7,016  12,753
per wafer

The second economic factor incentivizing the creation of smaller transistors is Dennard scaling, a rarely heard technical term named after the first author of a 1974 paper showing that transistor power consumption scaled with area (for very small transistors). Halving the area occupied by a transistor, halves the power consumed, at the same frequency.

The maximum clock-frequency of a microprocessor is limited by the amount of heat it can dissipate; the heat produced is proportional to the power consumed, which is approximately proportional to the clock-frequency. Instead of a device having smaller transistors consume less power, they could consume the same power at double the frequency.

Dennard scaling primarily provides a customer benefit.

Figuring out how to further shrink the size of transistors requires an investment in research, followed by designing/(building or purchasing) new equipment. Why would a company, who had invested in researching and building their current manufacturing capability, be willing to invest in making it obsolete?

The fear of losing market share is a commercial imperative experienced by all leading companies. In the microprocessor market, the first company to halve the size of a transistor would be able to produce twice as many microprocessors (at a lower cost) running twice as fast as the existing products. They could (and did) charge more for the latest, faster product, even though it cost them less than the previous version to manufacture.

Building cheaper, faster products is a means to an end; that end is receiving a decent return on the investment made. How large is the market for new microprocessors and how large an investment is required to build the next generation of products?

Rock’s law says that the cost of a chip fabrication plant doubles every four years (the per wafer price in the table above is increasing at a slower rate). Gambling hundreds of millions of dollars, later billions of dollars, on a next generation fabrication plant has always been a high risk/high reward investment.

The sales of microprocessors are dependent on the sale of computers that contain them, and people buy computers to enable them to use software. Microprocessor manufacturers thus have to both convince computer manufacturers to use their chip (without breaking antitrust laws) and convince software companies to create products that run on a particular processor.

The introduction of the IBM PC kick-started the personal computer market, with Wintel (the partnership between Microsoft and Intel) dominating software developer and end-user mindshare of the PC compatible market (in no small part due to the billions these two companies spent on advertising).

An effective technique for increasing the volume of microprocessors sold is to shorten the usable lifetime of the computer potential customers currently own. Customers buy computers to run software, and when new versions of software can only effectively be used in a computer containing more memory or on a new microprocessor which supports functionality not supported by earlier processors, then a new computer is needed. By obsoleting older products soon after newer products become available, companies are able to evolve an existing customer base to one where the new product is looked upon as the norm. Customers are force marched into the future.

The plot below shows sales volume, in gigabytes, of various sized DRAM chips over time. The simple story of exponential growth in sales volume (plus signs) hides the more complicated story of the rise and fall of succeeding generations of memory chips (code+data):

Sales volume, in gigabytes, of various sized DRAM chips over time.

The Red Queens had a simple task, keep buying the latest products. The activities of the companies supplying the specialist equipment needed to build a chip fabrication plant has to be coordinated, a role filled by the International Technology Roadmap for Semiconductors (ITRS). The annual ITRS reports contain detailed specifications of the expected performance of the subsystems involved in the fabrication process.

Moore’s law is now dead, in that transistor doubling now takes longer than two years. Would transistor doubling time have taken longer than two years, or slowed down earlier, if:

  • the ecosystem had not been dominated by two symbiotic companies, or did network effects make it inevitable that there would be two symbiotic companies,
  • the Internet had happened at a different time,
  • if software applications had quickly reached a good enough state,
  • if cloud computing had gone mainstream much earlier.

Focus is not divisable so limit you OKRs

From time to time I hear about teams who have 8, 9, 10 or more OKRs in a quarter. That is just plain wrong. In Succeeding with OKRs in Agile I suggest 3 Objectives per quarters each with 3 key results. When I hear the cries of pain and people twist my arm I compromise on 4 objectives and about 4 key results.

Now those numbers are MAXIMUMs, I’d really like fewer, and I’ve heard of teams which have just 1 – yes ONE – objective per quarter. I’m itching to try that with a team.

Sometimes people respond and say: “Arhh, but we have a big team, I agree with 3 being the right number for a team of six but we have a team of 16 so surely we could have more objectives?”

But actually, when you have a bigger team you have a bigger problem and hence even more reason to limit the number of OKRs.

Part of the power of OKRs is that they create and maintain Focus. Having agreed and stated outcomes to work towards gives individuals something to focus, it gives team members – and particularly product owners – a reason to say No when more work appears. It keeps the team honest when looking at what needs doing and deciding how to spent their time.

New options to learn about OKRs and Agile

Focus is not divisible – devide your focus and you no longer have focus. When you have a bigger team you have more need for focus rather than less. One could even argue that that as the team grows the number of OKRs should reduce not increase.

Bigger teams, because there are more people, struggle more with focus than small teams. On a small team the lack of capacity forces trade-offs and brings people face-to-face with limited capacity. On a big team its easy to think one or two people can go and do something different, or even for individuals to hide.

By the way, this applies equally if you extend the OKR cycle: setting OKRs every six months rather than every three should be a reason to reduce the number of OKRs rather than increase them.

Once upon a time I worked with a team that had real focus problems: teams members found little overlap in their work. Consequently there were seven or eight OKRs each month. That was itself information, when you looked at the OKRs they were disjoint, the team was not focusing because it had three – or four – very different work streams and the people on the team had different skills.

The solution was to split the team into three mini-teams each with their own OKRs. One could argue that the full team got more OKRs but what happened was that each mini-team could now focus and work towards their goal with focus, with less distraction and greater purpose.

This keeps things simple – the Rule of Three! – and keep things focused.


Subscribe to my blog newsletter and download Continuous Digital for free


Photo by David Travis on Unsplash

The post Focus is not divisable so limit you OKRs appeared first on Allan Kelly.

OKRs workshop, tutorials and free stuff

Two opportunities to learn more about OKRs. Both based on Succeeding with OKRs in Agile.

Implementing OKRs in Agile

24 February: 1-day online workshop, hosted by iLean in Belgium and open to all.

Combining OKR and Agile

Online tutorial series – this is a mix free and paid for material.

You can buy the tutorials individually or as a bundle. Subscribing to the bundle is much cheaper and gives access to new tutorials as I add them. My plan is to add one new tutorial each month.

Use the code blogreader to get 20% the paid elements.

The post OKRs workshop, tutorials and free stuff appeared first on Allan Kelly.

Including natural language text topics in a regression model

The implementation records for a project sometimes include a brief description of each task implemented. There will be some degree of similarity between the implementation of some tasks. Is it possible to calculate the degree of similarity between tasks from the text in the task descriptions?

Over the years, various approaches to measuring document similarity have been proposed (more than you probably want to know about natural language processing).

One of the oldest, simplest and widely used technique is term frequency–inverse document frequency (tf-idf), which is based on counting word frequencies, i.e., is word context is ignored. This technique can work well when there are a sufficient number of words to ensure a good enough overlap between similar documents.

When the description consists of a sentence or two (i.e., a summary), the problem becomes one of sentence similarity, not document similarity (so tf-idf is unlikely to be of any use).

Word context, in a sentence, underpins the word embedding approach, which represents a word by an n-dimensional vector calculated from the local sentence context in which the word occurs (derived from a large amount of text). Words that are closer, in this vector space, are expected to have similar meanings. One technique for calculating the similarity between sentences is to compare the averages of the word embedding of the words they contain. However, care is needed; words appearing in the same context can create sentences having different meanings, as in the following (calculated sentence similarity in the comments):

import spacy
nlp=spacy.load("en_core_web_md") # _md model needed for word vectors
nlp("the screen is black").similarity(nlp("the screen is white"))
# 0.9768339369182919  # closer to 1 the more similar the sentences
nlp("implementing widgets would be little effort").similarity(nlp("implementing widgets would be a huge effort"))
# 0.9636533803238744
nlp("the screen is black").similarity(nlp("implementing widgets would be a huge effort"))
# 0.6596892830922606

The first pair of sentences are similar in that they are about the characteristics of an object (i.e., its colour), while the second pair are similar in that are about the quantity of something (i.e., implementation effort), and the third pair are not that similar.

The words in a document, or summary, are about some collection of topics. A set of related documents are likely to contain a discussion of a set of related topics in varying degrees. Latent Dirichlet allocation (LDA) is a widely used technique for calculating a set of (unseen) topics from a set of documents and their contained words.

A recent paper attempted to estimate task effort based on the similarity of the task descriptions (using tf-idf). My last semi-serious attempt to extract useful information from text, some years ago, was a miserable failure (it’s a very hard problem). Perhaps better techniques and tools are now available for me to leverage (my interest is in understanding what is going on, not making predictions).

My initial idea was to extract topics from task data, and then try to add these to regression models of task effort estimation, to see what impact they had. Searching to find out what researchers have recently been doing in this area, I was pleased to see that others were ahead of me, and had implemented R packages to do the heavy lifting, in particular:

  • The stm package supports the creation of Structural Topic Models; these add support for covariates to influence the process of fitting LDA models, i.e., a correlation between the topics and other variables in the data. Uses of STM appear to be oriented towards teasing out differences in topics associated with different values of some variable (e.g., political party), and the package authors have written papers analysing political data.
  • The psychtm package supports what the authors call supervised latent Dirichlet allocation with covariates (SLDAX). This handles all the details needed to include the extracted LDA topics in a regression model; exactly what I was after. The user interface and documentation for this package is not as polished as the stm package, but the code held together as I fumbled my way through.

To experiment using these two packages I used the SiP dataset, which includes summary text for each task, and I have previously analysed the estimation task data.

The stm package:

The textProcessor function handles all the details of converting a vector of strings (e.g., summary text) to internal form (i.e., handling conversion to lower case, removing stop words, stemming, etc).

One of the input variables to the LDA process is the number of topics to use. Picking this value is something of a black art, and various functions are available for calculating and displaying concepts such as topic semantic coherence and exclusivity, the most commonly used words associated with a topic, and the documents in which these topics occur. Deciding the extent to which 10 or 15 topics produced the best results (values that sounded like a good idea to me) required domain knowledge that I did not have. The plot below shows the extent to which the words in topic 5 were associated with the Category column having the value “Development” or “Management” (code+data):

Distribution of words contained in topics associated with Development and Management.

The psychtm package:

The prep_docs function is not as polished as the equivalent stm function, but the package’s first release was just last year.

After the data has been prepared, the call to fit a regression model that includes the LDA extracted topics is straightforward:

sip_topic_mod=gibbs_sldax(log(HoursActual) ~ log(HoursEstimate), data = cl_info,
                         docs = docs_vocab$documents, model = "sldax",
                         K = 10 # number of topics)

where: log(HoursActual) ~ log(HoursEstimate) is the simplest model fitted in the original analysis.

The fitted model had the form: HoursActual approx HoursEstimate^{0.81} e^{0.13 topic_1} e^{0.18 topic_2}..., with the calculated coefficient for some topics not being significant. The value 0.81 is close to that fitted in the original model. The value of topic_i is the fraction of the topic_i calculated to be present in the Summary text of the corresponding task.

I’m please to see that a regression model can be improved by adding topics derived from the Summary text.

The SiP data includes other information such as work Category (e.g., development, management), ProjectCode and DeveloperId. It is to be expected that these factors will have some impact on the words appearing in a task Summary, and hence the topics (the stm analysis showed this effect for Category).

When the model formula is changed to: log(HoursActual) ~ log(HoursEstimate)+ProjectCode, the quality of fit for most topics became very poor. Is this because ProjectCode and topics conveyed very similar information, or did I need to be more sophisticated when extracting topic models? This needs further investigation.

Can topic models be used to build prediction models?

Summary text can only be used to make predictions if it is available before the event being predicted, e.g., available before a task is completed and the actual effort is known. My interest in model building is to understand the processes involved, so I am not worried about when the text was created.

My own habit is to update, or even create Summary text once a task is complete. I asked Stephen Cullen, my co-author on the original analysis and author of many of the Summary texts, about the process of creating the SiP Summary sentences. His reply was that the Summary field was an active document that was updated over time. I suspect the same is true for many task descriptions.

Not all estimation data includes as much information as the SiP dataset. If Summary text is one of the few pieces of information available, it may be possible to use it as a proxy for missing columns.

Perhaps it is possible to extract information from the SiP Summary text that is not also contained in the other recorded information. Having been successful this far, I will continue to investigate.

On A Day At The Races – student

Most recently the Baron challenged Sir R----- to a race of knights around the perimeter of a chessboard, with the Baron starting upon the lower right hand square and Sir R----- upon the lower left. The chase proceeded anticlockwise with the Baron moving four squares at each turn and Sir R----- by the roll of a die. Costing Sir R----- one cent to play, his goal was to catch or overtake the Baron before he reached the first rank for which he would receive a prize of forty one cents for each square that the Baron still had to traverse before reaching it.

Tracking software evolution via its Changelog

Software that is used evolves. How fast does software evolve, e.g., much new functionality is added and how much existing functionality is updated?

A new software release is often accompanied by a changelog which lists new, changed and deleted functionality. When software is developed using a continuous release process, the changelog can be very fine-grained.

The changelog for the Beeminder app contains 3,829 entries, almost one per day since February 2011 (around 180 entries are not present in the log I downloaded, whose last entry is numbered 4012).

Is it possible to use the information contained in the Beeminder changelog to estimate the rate of growth of functionality of Beeminder over time?

My thinking is driven by patterns in a plot of the Renzo Pomodoro dataset. Renzo assigned a tag-name (sometimes two) to each task, which classified the work involved, e.g., @planning. The following plot shows the date of use of each tag-name, over time (ordered vertically by first use). The first and third black lines are fitted regression models of the form 1-e^{-K*days}, where: K is a constant and days is the number of days since the start of the interval fitted; the second (middle) black line is a fitted straight line.

at-words usage, by date.

How might a changelog line describing a day’s change be distilled to a much shorter description (effectively a tag-name), with very similar changes mapping to the same description?

Named-entity recognition seemed like a good place to start my search, and my natural language text processing tool of choice is still spaCy (which continues to get better and better).

spaCy is Python based and the processing pipeline could have all been written in Python. However, I’m much more fluent in awk for data processing, and R for plotting, so Python was just used for the language processing.

The following shows some Beeminder changelog lines after stripping out urls and formatting characters:

Cheapo bug fix for erroneous quoting of number of safety buffer days for weight loss graphs.
Bugfix: Response emails were accidentally off the past couple days; fixed now. Thanks to user bmndr.com/laur  for alerting us!  
More useful subject lines in the response emails, like "wrong lane!" or whatnot.
Clearer/conciser stats at bottom of graph pages. (Will take effect when you enter your next datapoint.) Progress, rate, lane, delta.  
Better handling of significant digits when displaying numbers. Cf stackoverflow.com/q/5208663

The code to extract and print the named-entities in each changelog line could not be simpler.

import spacy
import sys

nlp = spacy.load("en_core_web_sm") # load trained English pipelines

count=0 
        
for line in sys.stdin:
   count += 1 
   print(f'> {count}: {line}')
#
   doc=nlp(line) # do the heavy lifting
#          
   for ent in doc.ents:  # iterate over detected named-entities
      print(ent.lemma_, ent.label_)

To maximize the similarity between named-entities appearing on different lines the lemmas are printed, rather than original text (i.e., words appear in their base form).

The label_ specifies the kind of named-entity, e.g., person, organization, location, etc.

This code produced 2,225 unique named-entities (5,302 in total) from the Beeminder changelog (around 0.6 per day), and failed to return a named-entity for 33% of lines. I was somewhat optimistically hoping for a few hundred unique named-entities.

There are several problems with this simple implementation:

  • each line is considered in isolation,
  • the change log sometimes contains different names for the same entity, e.g., a person’s full name, Christian name, or twitter name,
  • what appear to be uninteresting named-entities, e.g., numbers and dates,
  • the language does not know much about software, having been training on a corpus of general English.

Handling multiple names for the same entity would a lot of work (i.e., I did nothing), ‘uninteresting’ named-entities can be handled by post-processing the output.

A language processing pipeline that is not software-concept aware is of limited value. spaCy supports adding new training models, all I need is a named-entity model trained on manually annotated software engineering text.

The only decent NER training data I could find (trained on StackOverflow) was for BERT (another language processing tool), and the data format is very different. Existing add-on spaCy models included fashion, food and drugs, but no software engineering.

Time to roll up my sleeves and create a software engineering model. Luckily, I found a webpage that provided a good user interface to tagging sentences and generated the json file used for training. I was patient enough to tag 200 lines with what I considered to be software specific named-entities. … and now I have broken the NER model I built…

The following plot shows the growth in the total number of named-entities appearing in the changelog, and the number of unique named-entities (with the 1,996 numbers and dates removed; code+data);

Growth of total and unique named-entities in the Beeminder changelog.

The regression fits (red lines) are quadratics, slightly curving up (total) and down (unique); the linear growth components are: 0.6 per release for total, and 0.46 for unique.

Including software named-entities is likely to increase the total by at least 15%, but would have little impact on the number of unique entries.

This extraction pipeline processes one release line at a time. Building a set of Beeminder tag-names requires analysing the changelog as a whole, which would take a lot longer than the day spent on this analysis.

The Beeminder developers have consistently added new named-entities to the changelog over more than eleven years, but does this mean that more features have been consistently added to the software (or are they just inventing different names for similar functionality)?

It is not possible to answer this question without access to the code, or experience of using the product over these eleven years.

However, staying in business for eleven years is a good indicator that the developers are doing something right.

Practical tips or mindset change?

How many books on your bookshelves have a number in the title? Specifically a list of X things. Such books sell, blog posts of a similar ilk get read.

“50 specific ways to improve your programs”

“97 things every dog walker should know”

“10 practical things every Scrum Master should know”

“51 tips to improve your requirements”

Small, specific nuggets of information, best presented as a list and advertised as such. No grand unifying thesis, just “75 things”. The closest I have ever come to this was “Little Book of Requirements and User Stories” which was my best seller and would have sold more if I had called it “16 tips to improve your User Stories.”

However, most of my books aren’t like that. Most of my books contain a big idea – at least one big idea. The whole book sets out to explain that. Business Patterns does say “38 Business strategy patterns” but really the books big idea was “Apply pattern thinking to business strategy”. In retrospect it would have sold better if I had called the book “38 Business strategy patterns” and put the pattern thinking stuff as an appendix.

Regular readers might notice that my blogs follow a similar pattern: mostly long thoughtful pieces which try to build an argument, few practical posts thrown in once in a while. Despite knowing I should write more short practical pieces (to boost readership) I keep failing.

Why?

Two reasons.

Sometimes those “short practical tips” seem so trivial, or so obvious, that I just assume everyone does it that way and everyone sees what I see. They are so small and so “obvious” I don’t see them.

But more because I see value in those long pieces. I see them as “philosophy” pieces, they are about how to see the world, how to comprehend what is going on, sense-making. Quite often I will wrestle with balancing forces, how one force pushed you one way while another pushes you another. The right course of action is about balancing those forces and what is “right” may be different at different times. (Thats a pattern thing.)

It might be better if I called those “Mindset” pieces. They are about preparing the mind to see the world in a particular way. Conditioning you for agile, perhaps.

To me those Mindset pieces are more important because they shape the way you respond. In the complex world in which we live few decisions and few courses of action can actually be boiled down to a simple “If this Then do That”. Instead, the thousands of small decisions you make each day are informed by your mindset (philosophy) of how the world works and what will happen if you make decision X instead of decision Y.

Especially for those working in management, it is your mental view of the world that shapes your decisions and relationships. I’m sure somewhere out there is a “50 practical tips for better management decisions” book but in truth there are so many variables, unknowns and ambiguities that you can’t boil the world down like that.

Thats why, while everyone is short of time and wants “10 practical tips” to fix a problem right now it is more important to spend time really challenging your own thinking. Change can only really become permanent when people change their actions and decisions without thinking each time, when people can make decision #563 today congruently to everything else not because they read it in book but because that is the way their mind works.

Our constant search for “quick answers” can mislead us, we might get a quick answer but we aren’t necessarily building our long term capability.

In Succeeding with OKRs in Agile, I tried hard to write a hands-on-practical tips book. I failed but in failing I did better than I would have done without trying. I very deliberately kept the opening chapters short and quickly moved into “practical tips” (mainly about writing OKRs). Almost all the mindset philosophy was pushed later in the book. So far sales suggest I got it right.

So, even as I strive this year to write more “10 practical tips” blog posts I expect I’ll have more philosophy as I put the world to rights!


Subscribe and download Continuous Digital for free

The post Practical tips or mindset change? appeared first on Allan Kelly.

Team Retrospective cards are back, and better than before

Agile Stationary have given retrospective cards a new home and are handling all the sales and logistics. That means everything should be slicker and export to anywhere in the world should be hassle free.

Agile Stationary gave the cards another print run and in the process enlarged the cards slightly. So while they can still fit in your pocket they are a bit easier to handle.

To mark the occasion Agile Stationary are offering a 20% discount to blog readers, use the code TEAMRETRO20.


Subscribe to my blog newsletter and download

Continuous Digital for free

The post Team Retrospective cards are back, and better than before appeared first on Allan Kelly.

Join a crowdsourced search for software engineering data

Software engineering data, that can be made publicly available, is very rare; most people don’t attempt to collect data, and when data is collected, people rarely make any attempt to hang onto the data they do collect.

Having just one person actively searching for software engineering data (i.e., me) restricts potential sources of data to be English speaking and to a subset of development ecosystems.

This post is my attempt to start a crowdsourced campaign to search for software engineering data.

Finding data is about finding the people who have the data and have the authority to make it available (no hacking into websites).

Who might have software engineering data?

In the past, I have emailed chief technology officers at companies with less than 100 employees (larger companies have lawyers who introduce serious amounts of friction into releasing company data), and this last week I have been targeting Agile coaches. For my evidence-based software engineering book I mostly emailed the authors of data driven papers.

A lot of software is developed in India, China, South America, Russia, and Europe; unless these developers are active in the English-speaking world, I don’t see them.

If you work in one of these regions, you can help locate data by finding people who might have software engineering data.

If you want to be actively involved, you can email possible sources directly, alternatively I can email them.

If you want to be actively involved in the data analysis, you can work on the analysis yourself, or we can do it together, or I am happy to do it.

In the English-speaking development ecosystems, my connection to the various embedded ecosystems is limited. The embedded ecosystems are huge, there must be software data waiting to be found. If you are active within an embedded ecosystem, you can help locate data by finding people who might have software engineering data.

The email template I use for emailing people is below. The introduction is intended to create a connection with their interests, followed by a brief summary of my interest, examples of previous analysis, and the link to my book to show the depth of my interest.

By all means cut and paste this template, or create one that you feel is likely to work better in your environment. If you have a blog or Twitter feed, then tell them about it and why you think that evidence-based software engineering is important.

Be responsible and only email people who appear to have an interest in applying data analysis to software engineering. Don’t spam entire development groups, but pick the person most likely to be in a position to give a positive response.

This is a search for gold nuggets, and the response rate will be very low; a 10% rate of reply, saying sorry not data, would be better than what I get. I don’t have enough data to be able to calculate a percentage, but a ballpark figure is that 1% of emails might result in data.

I treat the search as a background task, taking months to locate and email, say, 100-people considered worth sending a targeted email. My experience is that I come up with a search idea or encounter a blog post that suggests a line of enquiry, that may result in sending half-a-dozen emails. The following week, if I’m lucky, the same thing might happen again (perhaps with fewer emails). It’s a slow process.

If people want to keep a record of ideas tried, the evidence-based software engineering Slack channel could do with some activity.

Hello,

A personalized introduction, such as: I have been reading
your blog posts on XXX, your tweets about YYY,
your youtube video on ZZZ.

My interest is in trying to figure out the human issues
driving the software process.

Here are two detailed analysis of Agile estimation data:
https://arxiv.org/abs/1901.01621
and
https://arxiv.org/abs/2106.03679

My book Evidence-based Software Engineering discusses what is
currently known about software engineering, based on an
analysis of all the publicly available data.
pdf+code+all data freely available here:
http://knosof.co.uk/ESEUR/

and I'm always on the lookout for more software data.
This email is a fishing request for software engineering data.

I offer a free analysis of software data, provided an
anonymised version of the data can be made public.

Air-Source Heat Pump – our experience so far, 2 months in

Summary: less energy, more money

2 months ago, we replaced our gas boiler with an air-source heat pump, which uses electricity to heat our home and boiler. This is a report of our experience so far.

We expected it to reduce our environmental impact, and cost us more money, and we were right.

It works: our house is comfortable. We use a lot less energy, and it costs us significantly more money (because electricity costs way more than gas).

The house

Our house is a beautiful, leaky old house, with a modern extension. Half of it is well-insulated. The other half was built around 1890, and while we do have double-glazing and decent loft insulation, the walls have no cavities and feel cold to the touch, and there are drafts everywhere.

The new half has underfloor heating. The old half and the upstairs are heated by radiators. We have a hot water cylinder.

The air-source heat pump

Our air-source heat pump uses electricity to extract heat from the outside air and heats water for radiators and hot water, directly replacing our gas boiler.

Our heat pump was installed by Your Energy Your Way and I must declare in interest: my wife is a director of the company.

The heat pump is an LG 16kW “THERMA V” model. It looks like a very large air conditioning unit, which sits outside our house in the yard to the side. It is about as tall as my shoulder height, with two big fans on it.

A large air-source heat pump

It stands on a soak-away area with some stones on it that the installers made by removing some patio tiles. This is needed because it drips a small amount of liquid as part of its normal operation. The outdoor unit makes noise, but our house is next to the main road, so we don’t hear it. It is not audible indoors.

Standing next to the outdoor unit you can feel a cold breeze, like opening the fridge door. This is unpleasant on cold days.

That outdoor unit connects through the wall to an indoor part that is a bit smaller than our old boiler.

The controller box has a terrible user interface and is very hard to decipher, but we did eventually manage to programme it to turn the target temperature up in the daytime and down at night. Your Energy Your Way advised us that it is more efficient to keep the house at a cool-ish 17 degrees at night, rather than letting it get cold and having to work hard heat it up again in the morning, so that is how we have set it up.

The controller box’s built-in thermostat does not work properly (it reports the wrong temperature), so we had to add an external thermostat, which works well.

We didn’t need to change anything about our hot water cylinder, or our underfloor heating.

When planning the installation, Your Energy Your Way estimated the heat loss of our rooms, and recommended upgrading our radiators. In an old house like ours this is sometimes needed, because it is way more efficient to heat a house with cooler water running through the radiators, but if the water is cooler, you need more radiator surface area to heat the house effectively. In a newer house with existing radiators, they are probably fine as-is.

We kept most of the existing radiators, and added some more in the coldest rooms.

How comfortable is the house?

The house is more comfortable than it was before, for two reasons: firstly the radiators we had were not really adequate, and secondly the cooler water in the radiators makes a less irritating heat, meaning the house is nicely comfortable most of the time, instead of bouncing between feeling cold and feeling oppressively over-heated.

On cold days, the old part of the house is a bit cold, but I think on average it’s a little better than it was before.

We do find mornings can be chilly, particularly because the system stops heating the radiators if the hot water cylinder needs heating up after people have had showers. We could improve this situation by getting a larger cylinder, which we are considering.

However, it’s worth pointing out that we needed engineers to visit four or five times to make adjustments before we felt the system was working well enough. There are a lot of things that can be tweaked, and it took some time for it to work well.

My advice: don’t pick the cheapest quote – pick the people you think you can trust to do the work well: especially the heat loss calculations before installation and the adjustments afterwards.

How much energy are we using? (The good news)

So far, it looks like we are using about two-thirds less energy in our household than we were before:

The above chart is stacked, so the top line represents the total energy usage. We switched to the air-source heat pump exactly when our gas usage was about to skyrocket (because it’s cold in winter), and it remained relatively low.

This is absolutely fantastic: our house is more comfortable than before, and we have reduced the amount of energy we are using by 66%. This is the total energy usage of our house, not just for heating, so the reduction of energy used for heating is even more dramatic than it looks.

Even better, the energy we use is at least partly produced from renewable sources, so our carbon footprint is much lower. Previously we were directly releasing carbon by burning imported gas – now we use mostly UK-produced electricity, and as the grid decarbonises, our carbon footprint reduces even if we make no further changes.

How much money are we spending? (The bad news)

Excluding standing charges*, we are spending about one third more on energy than we were before. This is because electricity is so much more expensive than gas: our electricity costs 19p per kWh and our gas costs 4p per kWh.

* Note: our energy provider wanted to charge us £350 to remove our gas meter, so we refused, and are still paying the gas standing charge. I’m not sure how we’re going to resolve this, especially since our energy provider is now in administration.

The above chart is stacked, so the top line represents the total cost (excluding standing charges). When we switched to the air-source heat pump, our energy costs increased faster than they did the same time last year, and were consistently higher. We think the peak in November might be misleading as it may have been when the system was not set up correctly, but we are not sure.

Because air-source heat pumps are more efficient when the weather is warmer, we do expect to fare better in the summer than we are right now.

I would not suggest getting a heat pump if you want to save money. Maybe this will change as gas prices are expected to rise significantly this year.

An installation like ours, including new radiators, costs £10-15K. A decent chunk of that will be paid back to us by the government, spread out over the next 7 years, under the soon-to-be-gone Renewable Heat Incentive (RHI). RHI will be replaced by the
Boiler Upgrade Scheme (BUS), which will be limited to a £5K grant for air-source heat pumps, although it is paid up-front. We would have received much less money under BUS than RHI. It is almost certainly too late for you to get a heat pump under RHI, by the way – all the installers are booked up until end of March 2022, when it ends.

Thoughts

If you think it’s surprising (and deeply concerning) that taking the step of significantly reducing our carbon footprint cost us a one-third increase in our energy bills, I would agree with you.

I am told that the tax taken on electricity is much higher than on gas, even though these taxes are apparently intended help decarbonise our energy.

Meanwhile, the government is replacing (with great fanfare) RHI with the much less generous (although more timely) BUS, making it even more economically punishing to reduce your carbon footprint.

I think this should be addressed urgently: money should be provided to help people install heat pumps, and the tax regime should be changed to make it cheap to use low-carbon fuels.

The technology is available, but the financial situation makes this a vanity project for people like me who can afford it, instead of what it could be: a feasible plan to get our national carbon usage down, fast.

On a positive note, our house is nice and warm, and I feel a bit less guilty about how much carbon we’re using to keep it cosy.

Using Eliptical curve cryptography for TLS with Postfix, Dovecot and nginx

I may have mentioned this before - I do run my own virtual servers for important services (basically email and my web presence). I do this mostly for historic reasons and also because I’m not a huge fan of using centralised services for all of the above. The downside is that you pretty much have to learn at least about basic security. Over the 20+ years I’ve been doing this, the Internet hasn’t exactly become a less hostile place.

Chi Chi Again – a.k.

Several years ago we saw that, under some relatively easily met assumptions, the averages of independent observations of a random variable tend toward the normal distribution. Derived from that is the chi-squared distribution which describes the behaviour of sums of squares of independent standard normal random variables, having means of zero and standard deviations of one.
In this post we shall see how it is related to the gamma distribution and implement its various functions in terms of those of the latter.

Can you keep Agile and OKRs seperate?

“I’ve been told to keep agile and OKRs separate”

The first time I head this I was surprised, “missed opportunity” I thought but then, as I thought about it more, the more I realised that it was impossible.

Start with the OKRs: OKRs are about deciding what to put your time and energy into. OKRs are about the big priorities for your organization and team. The more I’ve spent time with OKRs, the more I’ve come to see them as the management method rather than a management method among many. Let me caveat that lest it sound arrogate: management within an organization.

The management approach

There are many management approaches out there: strict time-and-motion were workers time is schedule to the minute by experts; complete devolution giving employees free rein and managers (if they exist at all) only exist to coach. And there is everything in between, including project management which attempts to define the start and stop dates in advance. At this level OKRs are one management approach among many and organizations are free to choose which they follow.

Even combining traditional HR performance review processes with OKRs can lead to ruin. Once compensation is conected to OKRs people become incentivised to stay safe by setting OKRs which bring rewards, i.e. not ambitions ones that might be missed.

Running any other management method in tandem with OKRs risks jeopardising both. So if you choose OKR then follow it all the way, call it “Extreme OKRs” if you like.

Just try imaging agile as something separate to your OKRs: you set OKRs and then you run iterations. What are you delivering in the iteration? Surely iterations are delivering progress against OKRs?

I suppose you could have a backlog of work to do (Track A) and some OKRs to work on as well (Track B). Track A and B might lead to different places or represent different work to do. Leave aside potential conflict for a minute, think about how you divide your time.

More WIP, fewer results

Agile teaches that work in progress should be minimised, but now in this example there are two sanctioned work streams. Maybe we could ring-fence work: Agile in the morning, OKRs in the afternoon. I find it hard to see that working well.

Maybe A could be the main stream and B other a “best efforts” / “spare time” stream. But, if both A and B are important then why leave prioritisation be left to the worker? It smells a bit of leadership abdicating responsibility for prioritisation.

It is a fantasy to think that workers can focus on delivering the backlog and in their “spare time” deliver the OKRs. If your workers have copious amounts of spare time then something else is seriously wrong. It is easy to overload workers, and thereby create problems further down the line. People will burn out, goals will be missed or goals are met but with such poor quality that problems emerge later.

I can see how you can run OKRs without agile.

And I’ve long seen Agile working without OKRs.

But if you have both Agile and OKRs in the same company I just don’t see how Agile and OKRs can be separated. Conversely I can see how they can work well together – yes, I wrote a book on that.

If you are going to have OKRs and Agile in the same company then you need to consider them as one thing, not as two separate endeavours.

Photo by Jackson Simmer on Unsplash


Subscribe to my blog newsletter and download Continuous Digital for free

The post Can you keep Agile and OKRs seperate? appeared first on Allan Kelly.

Find My Tea: A technical journey through new product development (online 1st February 2022)

What: Find My Tea: A technical journey through new product development

When:
Tuesday, February 1, 2022 7:00pm to 8:30p (GMT)

Where:  SyncIpswich (online)

RSVP: https://www.meetup.com/SyncIpswich-Ipswichs-Tech-Startup-Community/events/281991960/

 

 

After what feels like an age I’m getting back into speaking and of course I’m speaking about Find My Tea! This time it’s technical!

As well as online with SyncIpswich I’m also doing the ACCU Conference, nor(DEV):con and one other:

    ACCU Conference - 4pm 8th April (Bristol) - 90min version
    nor(DEV):con - 24th & 25th June (Norwich)
    TBC - July

Find My Tea: A technical journey through new product development

There is more to having a great idea for an app than just building the app. You’re not only required to be a full stack developer (whatever that means), which doesn’t usually include the skills for building an app, you need to understand and be competent in ‘Ops’ (there’s really no such thing as DevOps) and the automated pipelines used for testing and deploying the app, it’s backend services and supporting applications. And there is so much to choose from!

In this session I will take you on the journey of discovery from having an idea, to choosing, rechoosing and choosing again the different technologies and platforms I used to build and release a new product from scratch.

This session will be focussed on the technology choices made and the reasoning and not on the product itself - although of course this will feature. This will include the mobile technology, the technology used for the web applications, backed services, hosting and development pipelines.

You can download the Find My Tea app here: https://findmytea.co.uk

 

Paul Grenyer

Husband, father, software engineer, metaller, Paul has been writing software for over 35 years and professionally for more than 20. In that time he has worked for and in all sorts of companies from two man startups to world famous investment banks and insurance companies. He has built and run three limited companies, none of which made him a millionaire and two of which threatened his sanity on more than one occasion.

Paul was a founding member of both SyncNorwich and Norfolk Developers, two of the most successful tech and startup based community groups in the East. He created and chaired the hugely successful Norfolk Developers Conference (nor(DEV):con) for seven years bringing in speakers and delegates in the sphere of software engineering from around the globe.

Paul is currently a Senior Software Engineer at Bourne Leisure, the owners of Haven caravan parks, and the founder of the tea finding app, Find My Tea. He loathes the word Entrepreneur, not least because he struggles to spell it and it reminds him of Del Boy from the 80s sitcom Only Fools and Horses. He sees Entrepreneurship as a side effect of the creative process of problem solving, rather than a career path in its own right.

Despite having dealt with the world of business from directors of the board down, Paul has kept both feet firmly on the ground even when his head has been in the clouds with healthy doses of Heavy Metal, Science Fiction and Formula One and long hair until it started falling out in 2013.

Oh, and he loves good tea too!

Visual Lint 8.0.7.349 has been released

Visual Lint 8.0.7.349 has now been released.

This is a maintenance update for Visual Lint 8.0, and includes the following changes:

  • The Visual C++ project (.vcxproj) file parser now defines the value of _MSVC_LANGappropriately if /std:c++14, /std:c++17, /std:c++20 or /std:c++latest are used within a Visual Studio 2015, 2017, 2019 or 2022 project configuration.

  • Fixed a bug in the Clang-Tidy analysis results parser which could cause some "note" issues to be hidden.

  • Fixed a bug in the handling of UTF-8 files in VisualLintGui.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2022.lnt to support Visual Studio 2022 v17.0.4.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2019.lnt to support Visual Studio 2019 v16.11.8.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2017.lnt to support Visual Studio 2017 v15.9.42.

  • Updated the PC-lint Plus library indirect file lib-rb-atl.lnt.

  • Updated the help topic for the Clang-Tidy Analysis Configuration Dialog "General" page.

Download Visual Lint 8.0.7.349

Creating and evolving a programming language: funding

The funding for artists and designers/implementors of programming languages shares some similarities.

Rich patrons used to sponsor a few talented painters/sculptors/etc, although many artists had no sponsors and worked for little or no money. Designers of programming languages sometimes have a rich patron, in the form of a company looking to gain some commercial advantage, with most language designers have a day job and work on their side project that might have a connection to their job (e.g., researchers).

Why would a rich patron sponsor the creation of an art work/language?

Possible reasons include: Enhancing the patron’s reputation within the culture in which they move (attracting followers, social or commercial), and influencing people’s thinking (to have views that are more in line with those of the patron).

The during 2009-2012 it suddenly became fashionable for major tech companies to have their own home-grown corporate language: Go, Rust, Dart and Typescript are some of the languages that achieved a notable level of brand recognition. Microsoft, with its long-standing focus on developers, was ahead of the game, with the introduction of F# in 2005 (and other languages in earlier and later years). The introduction of Swift and Hack in 2014 were driven by solid commercial motives (i.e., control of developers and reduced maintenance costs respectively); Google’s adoption of Kotlin, introduced by a minor patron in 2011, was driven by their losing of the Oracle Java lawsuit.

Less rich patrons also sponsor languages, with the idiosyncratic Ivor Tiefenbrun even sponsoring the creation of a bespoke cpu to speed up the execution of programs written in the company language.

The benefits of having a rich sponsor is the opportunity it provides to continue working on what has been created, evolving it into something new.

Self sponsored individuals and groups also create new languages, with recent more well known examples including Clojure and Julia.

What opportunities are available for initially self sponsored individuals to support themselves, while they continue to work on what has been created?

The growth of the middle class, and its interest in art, provided a means for artists to fund their work by attracting smaller sums from a wider audience.

In the last 10-15 years, some language creators have fostered a community driven approach to evolving and promoting their work. As well as being directly involved in working on the language and its infrastructure, members of a community may also contribute or help raise funds. There has been a tiny trickle of developers leaving their day job to work full time on ‘their’ language.

The term Hedonism driven development is a good description of this kind of community development.

People have been creating new languages since computers were invented, and I don’t expect this desire to create new languages to stop anytime soon. How long might a language community be expected to last?

Having lots of commercially important code implemented in a language creates an incentive for that language’s continual existence, e.g., companies paying for support. When little or co commercial important code is available to create an external incentive, a language community will continue to be active for as long as its members invest in it. The plot below shows the lifetime of 32 secular and 19 religious 19th century American utopian communities, based on their size at foundation; lines are fitted loess regression (code+data):

Size at foundation and lifetime of 32 secular and 19 religious 19th century American utopian communities; lines are fitted loess regression.

How many self-sustaining language communities are there, and how many might the world’s population support?

My tracking of new language communities is a side effect of the blogs I follow and the few community sites a visit regularly; so a tiny subset of the possibilities. I know of a handful of ‘new’ language communities; with ‘new’ as in not having a Wikipedia page (yet).

One list contains, up until 2005, 7,446 languages. I would not be surprised if this was off by almost an order of magnitude. Wikipedia has a very idiosyncratic and brief timeline of programming languages, and a very incomplete list of programming languages.

I await a future social science PhD thesis for a more thorough analysis of current numbers.

Signing off 2021 and Not rolling over

“The decision about what to abandon is by far the most important and most neglected. … No organization which purposefully and systematically abandons the unproductive and obsolete ever wants for opportunities.”

Peter Drucker, Age of Discontinuity

I’m writing this on the last day of 2021 and for once I am confident in predicting that next year will be better than the one that is ending. Like most people I have a few routines I follow around this year and one of these relates specifically to this blog.

I often get asked: how do you get ideas for your blog?

The truth is I have too many ideas for this blog, right now I have 18 ideas for blog posts that are part written. That might be as little as a title, or a completely written post I haven’t editted yet – plus there is one entry I wrote and decided it was for me alone. When I have an idea for a post I just add an entry into my blog list, normally that would be a title and few bullet points. Sometimes it is just a title, occasionally I just type the whole thing as a stream.

These entries will not be rolled over for 2022. In fact, I just deleted 7 after writing that last paragaph. The remaining 11 will be folded up and put to one side in the MacJournal software I use for blogging. I have already created a 2022 folder for next year.

It is a bit like buying a daily newspaper, once the day is gone there is rarely any point to going back to read the bits you missed. Tomorrow is a new day with a new newspaper, new priorities and new things to read.

In order the have new ideas one must make space for new ideas, and that means throwing away ideas which have not be able to win the competition for attention to date. This idea is embedded in my thinking when I recommend teams using OKRs throw away their backlog. It is why I like Jeff Bezos’ “Everyday is Day-1” thinking.

Don’t get weighed down by yesterday’s good ideas, if they are really good ideas they will appear again. If not then removing them will make space for better ideas. Plus, by practicing new ideas you will get better at new ideas.

More importantly: throwing away those ideas and work-in-progress forces one to think about what is really important, what really will make difference and move you forward. Dispensing with the day-to-day trivial allows one to think big.

In truth, while I just irrevocably deleted seven potential blog entries the other 11 will not be lost for ever. They will just be hidden. If I am stuck in a few months time I know where they are – but for that matter, I could equally fish in the unused entries from 2020, 2019, 2018… And in complete honesty, there are two early drafts already in the 2022 folder that I’m keen to write.

So while I say, and I advise you to: “Throw away the backlog” I have no objection to someone keeping (some of) those good ideas in a bottom draw as long as a) nobody pretends they might be done one day, b) they do not distract from your focus on the important things.

Finally, if I am to think big for the next year what would I like this blog to carry? – in other words, what might you expect?

Top of the list is to focus on OKRs: I’ve been blown away by the interest since I published “Succeeding with OKRs in Agile” and I know a lot of people are wrestling with OKRs with agile.

I’d also like to focus myself on “small practical bits”. I know I have a tendency to be “philosophical” – in part that is because I believe that our “philosophy” informs our daily actions and decisions, and the pattern of those decisions, far more, and for far longer, than any given list of “10 things you should …” But I also know, readers – and book buyers! – like “small practical bits” so commercially I should do more of those.


Subscribe to my blog newsletter and download Continuous Digital for free

The post Signing off 2021 and Not rolling over appeared first on Allan Kelly.

Fishing for software data

During 2021 I sent around 100 emails whose first line started something like: “I have been reading your interesting blog post…”, followed by some background information, and then a request for software engineering data. Sometimes the request for data was specific (e.g., the data associated with the blog post), and sometimes it was a general request for any data they might have.

So far, these 100 email requests have produced one two datasets. Around 80% failed to elicit a reply, compared to a 32% no reply for authors of published papers. Perhaps they don’t have any data, and don’t think a reply is worth the trouble. Perhaps they have some data, but it would be a hassle to get into a shippable state (I like this idea because it means that at least some people have data). Or perhaps they don’t understand why anybody would be interested in data and must be an odd-ball, and not somebody they want to engage with (I may well be odd, but I don’t bite :-).

Some of those who reply, that they don’t have any data, tell me that they don’t understand why I might be interested in data. Over my entire professional career, in many other contexts, I have often encountered surprise that data driven problem-solving increases the likelihood of reaching a workable solution. The seat of the pants approach to problem-solving is endemic within software engineering.

Others ask what kind of data I am interested in. My reply is that I am interested in human software engineering data, pointing out that lots of Open source is readily available, but that data relating to the human factors underpinning software development is much harder to find. I point them at my evidence-based book for examples of human centric software data.

In business, my experience is that people sometimes get in touch years after hearing me speak, or reading something I wrote, to talk about possible work. I am optimistic that the same will happen through my requests for data, i.e., somebody I emailed will encounter some data and think of me 🙂

What is different about 2021 is that I have been more willing to fail, and not just asking for data when I encounter somebody who obviously has data. That is to say, my expectation threshold for asking is lower than previous years, i.e., I am more willing to spend a few minutes crafting a targeted email on what appear to be tenuous cases (based on past experience).

In 2022 I plan to be even more active, in particular, by giving talks and attending lots of meetups (London based). If your company is looking for somebody to give an in-person lunchtime talk, feel free to contact me about possible topics (I’m always after feedback on my analysis of existing data, and will take a 10-second appeal for more data).

Software data is not commonly available because most people don’t collect data, and when data is collected, no thought is given to hanging onto it. At the moment, I don’t think it is possible to incentivize people to collect data (i.e., no saleable benefit to offset the cost of collecting it), but once collected the cost of hanging onto data is peanuts. So as well as asking for data, I also plan to sell the idea of hanging onto any data that is collected.

Fishing tips for software data welcome.

Christmas books for 2021

This year, my list of Christmas books is very late because there is only one entry (first published in 1950), and I was not sure whether a 2021 Christmas book post was worthwhile.

The book is “Planning in Practice: Essays in Aircraft planning in war-time” by Ely Devons. A very readable, practical discussion, with data, on the issues involved in large scale planning; the discussion is timeless. Check out second-hand book sites for low costs editions.

Why isn’t my list longer?

Part of the reason is me. I have not been motivated to find new topics to explore, via books rather than blog posts. Things are starting to change, and perhaps the list will be longer in 2022.

Another reason is the changing nature of book publishing. There is rarely much money to be made from the sale of non-fiction books, and the desire to write down their thoughts and ideas seems to be the force that drives people to write a book. Sites like substack appear to be doing a good job of diverting those with a desire to write something (perhaps some authors will feel the need to create a book length tomb).

Why does an author need a publisher? The nitty-gritty technical details of putting together a book to self-publish are slowly being simplified by automation, e.g., document formatting and proofreading. It’s a win-win situation to make newly written books freely available, at least if they are any good. The author reaches the largest readership (which helps maximize the impact of their ideas), and readers get a free electronic book. Authors of not very good books want to limit the number of people who find this out for themselves, and so charge money for the electronic copy.

Another reason for the small number of good new, non-introductory, books, is having something new to say. Scientific revolutions, or even minor resets, are rare (i.e., measured in multi-decades). Once several good books are available, and nothing much new has happened, why write a new book on the subject?

The market for introductory books is much larger than that for books covering advanced material. While publishers obviously want to target the largest market, these are not the kind of books I tend to read.

Finally On A Clockwork Contagion – student

Over the course of the year my fellow students and I have spent our free time building mathematical models of the spread of disease, initially assuming that upon contracting the infection a person would immediately and forever be infectious, then adding periods of incubation and recovery before finally introducing the concept of location whereby the proximate are significantly more likely to interact than the distant and examining the consequences for a population distributed between several disparate villages.
Whilst it is most certainly the case that this was more reasonable than assuming entirely random encounters it failed to take into account the fact that folk should have a much greater proclivity to meet with their friends, family and colleagues than with their neighbours and it is upon this deficiency that we have concentrated our most recent efforts.

Visual Lint and log4j (TL;DR: we don’t use it)

A good question from a customer given a bunch of headlines about security holes in the log4j logging library:

Triggered by the recent log4j vulnerability our organisation is asking all our software vendors if their software is affected by it - and if so by when a patch will be provided. May I ask for your confirmation that Visual Lint is not affected by this exploit?

I suppose that Visual Lint is Java free and thus has no problem with it. Thanks a lot for your answer!

Hopefully our answer will prove reassuring:

Visual Lint is written almost entirely in native C++ (more specifically, it's written in C++ 14). There is only one Java project in the entire codebase - the project which implements the Eclipse plugin (to our knowledge, Eclipse plugins can only be implemented in Java).

However, that project is just a thin Java wrapper around a native DLL - and it doesn't use log4j at all.

So, you're correct. Visual Lint (and indeed all our products and infrastructure) are 100% log4j free.

So your organisation can rest easy in this case.

Top agile books which aren’t about agile or software

Christmas is almost here and the end of the year is nye. That means it is time for newspapers and journals to start publishing “Top books of the year” lists and “Christmas recommendations.” So, prompted by a recent thread on LinkedIn I thought I’d offer up my own book list: top books for agile folk from outside of agile (and software).

That is: books which are not explicitly about agile (or software development) but which contain a valuable message, and possibly techniques for those wanting to expand their knowledge of, well, agile.

Most of the books I’m about to list address philosophical, or mindset, underpinnings of agile: how to think in an agile way, rather than “how to do agile.” That might disappoint but think about it, how could a book from outside agile tell you how to do agile?

Well, actually, there are three which do.

First The Goal: written in the style of a novel this book explores the theory of constraints and elementary queuing theory, without mentioning either by name. Since it is written as a novel – with characters and back story! – this is an easy read. But please, don’t judge it as a novel, judge it for the message inside.

Next up is The 7 Habits of Highly Effective People: I blogged a few months ago how these habits could also underpin a team working style. Whether you read this for yourself or with an eye on your team this book does contain actionable advice – and some ideas on how to think. It can be a bit of a cringe in places and I’m not sure I agree with all the ideas but it is still a worthwhile read.

Finally in this section is a book at the opposite end of readability: Factory Physics.

Make no mistake this is a text book so it sets out to teach and can be hard going in places – there are plenty of equations. But, if it is a solid grounding in queuing theory, variability, lead times and the like you want then this is the book to go to. In fact, it might be the only book.

That is it for hands on books which will tell you how to do things on Monday morning. The books which are ones I consider “philosophy” – how to think. Thats my way of putting it, a more popular way of putting it is: Mindset. These are books which have shaped my thinking, my mindset, and as such underpin my approach to all things agile – and more!

First here is The Fifth Discipline. This may be the founding text in the field of organizaional learning, ultimately all agile learning and applying that learning. The “learning view” underpinned my own first book and that still fundamentally shapes my approach to working with individuals, teams and companies.

My next choice continues the organizational learning theme and is the source of perhaps the most famous quote from the that field:

“We understand that the only competitive advantage the company of the future will have is its managers’ ability to learn faster than then their competitors.”

The Living Company presents an alternative view of companies and organizations: rather than being rational profit maximising entities this book encourages you to see companies as living organisms. As such the organizations true goal is to live and continue living. Trade, and even profit, is simply a means to an end. And like all successful living things companies must learn and adapt, those that don’t will die.

Living Company is not alone in presenting an alternative narrative of how companies work. My penultimate book presents an alternative view of that most sacred of management practices: strategy.

The Rise and Fall of Strategic Planning is a major work that not only charts the historical rise of strategic planning and the subsequent fall it also present an alternative view of what strategy is and how companies come by strategies: strategy is a consistent pattern of behaviour, strategy is part plan but is also emergent and changes in respect to what happens. Strategy claims to be forward looking but is equally retrospective, strategy offers a story to link past events together.

Along the way Rise and Fall accidentally explains where the waterfall comes from (Robert McNamara), how planning is controlling and why even with almost unlimited resources (GE and Gosplan) the best attempts at planning have failed. If you harbour any ambition to implement Scrum at the corporate level make sure you read this book.

All the books above are over 10 years old and had I written this list 10 years ago it would probably be the same. But two years ago I read Grow the Pie, this advances the discussion of why companies exist and how to be a successful company – the secret is to have purpose and benefit society. Written before the pandemic it is now more relevant than ever. Again it isn’t an easy read but it pays back in thoroughness of argument and reasoning. And if for no other reason, read Grow the Pie to really understand what constitutes value.

Subscribe to my blog newsletter and download Continuous Digital for free

The post Top agile books which aren’t about agile or software appeared first on Allan Kelly.

Providing MapLibre-compatible style JSON from openstreetmap-tile-server

[Previous: Self-hosting maps on my laptop]

In the previous post I showed how to run OSM tile server stack locally.

Now I’ve managed to connect a MapLibre GL JS front end to my local tile server and it’s showing maps!

(It’s running inside Element Web, the awesome Matrix messenger I am working on. NOTE: this is a very, very early prototype!)

In the previous post I ran a docker run command to launch the tile server.

This time, I had to create a file style.json:

{
  "version": 8,
  "sources": {
    "localsource": {
      "type": "raster",
      "tiles": [
        "http://127.0.0.1:8080/tile/{z}/{x}/{y}.png"
      ]
    }
  },
  "layers": [
    {
      "id": "locallayer",
      "source": "localsource",
      "type": "raster"
    }
  ]
}

and then I launched the tile server with that file available in the document root:

docker run \
    -p 8080:80 \
    -v $PWD/style.json:/var/www/html/style.json \
    -v openstreetmap-data:/var/lib/postgresql/12/main \
    -v openstreetmap-rendered-tiles:/var/lib/mod_tile \
    -e THREADS=24 \
    -e ALLOW_CORS=enabled \
    -d overv/openstreetmap-tile-server:1.3.10 \
    run

Now I can point my MapLibre GL JS at that style file with code something like this:

this.map = new maplibregl.Map({
    container: my_container,
    style: "http://127.0.0.1:8080/style.json",
    center: [0, 0],
    zoom: 13,
});

Very excited to be drawing maps without any requests leaving my machine!

Self-hosting maps on my laptop

[See also: Providing MapLibre-compatible style JSON from openstreetmap-tile-server]

As part of my research for working on location sharing for Element Web, the Matrix-based instant messenger, I have been learning about tile servers.

I managed to get OSM tile server stack working on my laptop:

Here are a couple useful pages I read during my research:

Today I managed to run a real tile server on my laptop, using data downloaded from OpenStreetMap in a way that I think complies with their terms of use.

To run these commands you will need Docker, and hopefully nothing much else.

Download the data

I downloaded the UK data like this:

wget 'https://download.geofabrik.de/europe/great-britain-latest.osm.pbf'

You can find downloads for other regions at download.geofabrik.de/

Import it

Then I ran an import, which converts the PBF data into tiles that can be shown in a UI:

docker volume create openstreetmap-data
docker volume create openstreetmap-rendered-tiles
docker run \
    -v $PWD/great-britain-latest.osm.pbf:/data.osm.pbf \
    -v openstreetmap-data:/var/lib/postgresql/12/main \
    -v openstreetmap-rendered-tiles:/var/lib/mod_tile \
    -e THREADS=24 \
    overv/openstreetmap-tile-server:1.3.10 \
    import

(Change “great-britain” to match what you downloaded.)

On my quite powerful laptop this took 39 minutes to run.

Run the tile server

Finally, I launched the server:

(Make sure you’ve done the “Import it” step first.)

docker run \
    -p 8080:80 \
    -v openstreetmap-data:/var/lib/postgresql/12/main \
    -v openstreetmap-rendered-tiles:/var/lib/mod_tile \
    -e THREADS=24 \
    -e ALLOW_CORS=enabled \
    -d overv/openstreetmap-tile-server:1.3.10 \
    run

This should launch the docker container in the background, which you can check with docker ps.

Test it

You can now grab a single file by going to http://127.0.0.1:8080/tile/0/0/0.png, or interact with the map properly at http://127.0.0.1:8080.

It was quite unresponsive at first, but once it had cached the tiles I was looking at, it was very smooth.

Parkinson’s law, striving to meet a deadline, or happenstance?

How many minutes past the hour was it, when you stopped working on some software related task?

There are sixty minutes in an hour, so if stop times are random, the probability of finishing at any given minute is 1-in-60. If practice (based on the 200k+ time records in the CESAW dataset) the probability of stopping on the hour is 1-in-40, and for stopping on the half-hour is 1-in-48.

Why are developers more likely to stop working on a task, on the hour or half-hour?

Is this a case of Parkinson’s law, or are developers striving to complete a task within a specified time, or are they stopping because a scheduled activity takes priority?

The plot below shows the number of times (y-axis) work on a task stopped on a given minute past the hour (x-axis), for 16 different software projects (project number in blue, with top 10 numbers in red, code+data):

Number of times work on a task stopped at a given minute of the hour, for 16 projects.

Some projects have peaks at 50, 55, or thereabouts. Perhaps people are stopping because they have a meeting to attend, and a peak is visible because the project had lots of meetings, or no peak was visible because the project had few meetings. Some projects have a peak at 28 or 29, which might be some kind of time synchronization issue.

Is it possible to analyze the distribution of end minutes to reasonably infer developer project behavior, e.g., Parkinson’s law, striving to finish by a given time, or just not watching the clock?

An expected distribution pattern for both Parkinson’s law, and striving to complete, is a sharp decline of work stops after a reference time, e.g., end of an hour (this pattern is present in around ten of the projects plotted). A sharp increase in work stops prior to a reference time could also apply for both behaviors; stopping to switch to other work adds ‘noise’ to the distribution.

The CESAW data is organized by project, not developer, i.e., it does not list everything a developer did during the day. It is possible that end-of-hour work stops are driven by the need to synchronize with non-project activities, i.e., no Parkinson’s law or striving to complete.

In practice, some developers may sometimes follow Parkinson’s law, other times strive to complete, and other times not watch the clock. If models capable of separating out the behaviors were available, they might only be viable at the individual level.

Stop time equals start time plus work duration. If people choose a round number for the amount of work time, there is likely to be some correlation between start/end minutes past the hour. The plot below shows heat maps for start fraction of hour (y-axis) against end fraction of hour (x-axis) for four projects (code+data):

Heat map of start/end minute for tasks, for four projects.

Work durations that are exact multiples of an hour appear along the main diagonal, with zero/zero being the most common start/end pair (at 4% over all projects, with 0.02% expected for random start/end times). Other diagonal lines come from work durations that include a fraction of an hour, e.g., 30-minutes and 20-minutes.

For most work periods, the start minute occurs before the end minute, i.e., the work period does not cross an hour boundary.

What can be learned from this analysis?

The main takeaway is that there is a small bias for work start/end times to occur on the hour or half-hour, and other activities (e.g., meetings) cause ongoing work to be interrupted. Not exactly news.

More interesting ideas and suggestions welcome.

First understand the structure of a standard, then read it

Extracting useful information from the text in an ISO programming language standard first requires an understanding of the stylized English in which it is written.

I regularly encounter people who cite wording from the C Standard to back up their interpretation of a particular language construct. My first thought when this happens is: Do I want to spend the time explaining how the standard ‘works’ to get to the point of dealing with the topic being discussed?

I am not aware of any “How to read the C Standard” guide, or such a guide for any language.

Explaining enough detail to have a sensible conversation about the text takes, maybe, 10-30 minutes. The interpretation of text in any standard can depend on the section in which it occurs, and particular phrases can be specified to have different interpretations in different contexts. For instance, in the C Standard, a source code construct that violates a “shall” requirement specified in a “Constraints” section is about as serious as things get (i.e., it’s a mandatory compile time error), while violating a “shall” requirement specified outside a “Constraints” is undefined behavior (i.e., the compiler can do what it likes, including nothing).

New readers also get hung up on footnotes, which are a great source of confusion. Footnotes have no normative meaning; technically, they are classified as informative (their real use is providing the committee a means to include wording in the document to satisfy some interested party, without the risk of breaking the standard {because this text has no normative status}).

Sometimes a person familiar with the C++ Standard applies the interpretation rules they have learned to the C Standard. This can work in limited cases, but the fundamental differences between how the two documents are structured requires a reorientation of thinking. For instance, the C Standard specifies the behavior of source code (from which the behavior of implementations has to be inferred), while the C++ Standard specifies the behavior of implementations (from which the behavior of source code constructs has to be inferred), and the C++ Standard does not contain “Constraints” sections.

The general committee response, at least in WG14, to complaints that the language standard requires effort to understand, is that the standard is not intended as a tutorial. At least there is a prose document to read, there are forms of language specification that don’t provide this luxury. At a minimum, a language standard first needs to be read two or three times before trying to answer detailed questions.

In general, once somebody has learned to interpret one ISO Standard, the know-how does not transfer to other ISO language standards, but they have an appreciation of the need for such an understanding.

In theory, know-how is supposed to be transferable; part 2 of the ISO directives, Principles and rules for the structure and drafting of ISO and IEC documents, “… stipulates certain rules to be applied in order to ensure that they are clear, precise and unambiguous.” There are also the technical reports: Guidelines for the Preparation of Conformity Clauses in Programming Language Standards (published in 1990), which I suspect few people have read, even within the standards’ programming language community, and Guidelines for the preparation of programming language standards (unchanged since the fourth edition in 2003).

In practice: The Fortran and Cobol standards were written before people had any idea which rules might be appropriate; I think the Pascal standard appeared just before the rules were formalised. Also, all three standards were created by National bodies (US, US, and UK respectively) as National standards, and then ‘promoted’ as-is to be ISO standards. ADA was a DoD standard that got ‘promoted’, and very much did its own thing with regard to stylized English.

The post-1990 language standards visually look as if they follow the ISO rules in force at the time they were first written (Directives, part 2 is on its ninth edition), i.e., the titles of clauses match the clause numbering scheme specified by ISO rules, e.g., clause 3 specifies “Terms and definitions”. However, readers are going to need some cultural background on the use of the language by its community, to figure out the intent of the text. For instance, the 1990 revision of the Pascal Standard contains extensive use of “shall”, but it is not clear how this is to be interpreted; I used Pascal extensively for 10-years, but never studied its ISO standard, and reading it now with my C Standard expertise is a strange experience, e.g., familiar language “constraints” do not appear to be specified in the text, and the compiler does not appear to be required to flag anything.

Two of the pre-1990 language standards, Fortran and Cobol, were initially written in the 1960s, and read like they are from another age (probably because of the way they are laid out, and the fonts used). The differences are so obvious that any readers with prior experience are likely to understand that they are going to have to figure out the structure from scratch. The formatting of post-1990 Fortran Standards lacks the 1960s vibe.

Adapt Or Try – a.k.

Over the last few months we have been looking at how we might approximate the solutions to ordinary differential equations, or ODEs, which define the derivative of one variable with respect to another with a function of them both. Firstly with the first order Euler method, then with the second order midpoint method and finally with the generalised Runge-Kutta method.
Unfortunately all of these approaches require the step length to be fixed and specified in advance, ignoring any information that we might use to adjust it during the iteration in order to better trade off the efficiency and accuracy of the approximation. In this post we shall try to automatically modify the step lengths to yield an optimal, or at least reasonable, balance.

BA role in agile discovery

Adrian Reed of BlackMetric ran a webinar panel discussion last night with myself, Angela Wick, Angie Doyle and Howard Podeswa and myself last night about the Business Analyst role in Agile discovery. The discussion was great fun and Adrian has now made the recording available on YouTube.

This is not the first time I’ve appeared in one of Adrian’s webinars, at a minimum I recommend keeping your eye on his upcoming subjects as he regularly has great guests.


Subscribe to my blog newsletter and download Continuous Digital for free

The post BA role in agile discovery appeared first on Allan Kelly.

OKRs and Agile

Book cover: Succeeding with OKRs in Agile

“How to combine OKRs and Agile” is a short piece by my published on the GTM Hub blog. GTM Hub is the provider of OKR software.

As it happens another OKR software provider Just 3 Things has been running a series on OKRs which I and over 20 others contributed too. This series takes a question and answer form. The latest instalment is Questions you should ask before starting your OKR journey, previous posts include:

OKR predictions for the next 5 years

Common OKR mistakes

Advice for OKR champtions

Benefits of OKRs for companies and employees

Cultural and structural similarities at companies that create great OKRs

And of course, if you like these subjects you will enjoy “Succeeding with OKRs in Agile.

The post OKRs and Agile appeared first on Allan Kelly.

Including data in Python packages

Every time I need to include data in a Python package, I find myself going in circles checking existing projects, blog posts, and every other resource I can find to figure out the right way to do it. For something so seemingly straightforward, including data in a package always turns into a bit of a mess for me.

I had to make a package today that contained data, so - since it involved the standard running in circles for an hour - I thought I'd take the time to write down how I finally got it to work.

What is "package data"?

Broadly, package data is any files that you want to include with your Python package that aren't Python source files. An example is a TOML default configuration file that you want to be able to produce for users. It's not Python source code, so it wouldn't normally be included in a Python package. But with just a small amount of work, you can include it in a package and make it available programatically to users of your package (or your package itself).

The short version

  1. Set include_package_data to True in your setup.py.
  2. Set package_dir in your setup.py.
  3. Include a MANIFEST.in that references your data files.

If that doesn't mean anything to you, read on.

The longer version

Suppose you have a project structure like this:

setup.py
source/
    project/
        __init__.py
        data/
            default_config.toml

It's a fairly standard structure, with the source directory containing the actual package files. The name of the package in this case is project.

What stands out is the data/default_config.toml file under project. This is our package data. That is, it's a non-Python file that we want to include in our package. Normally setuptools won't include it in the distributions you build (e.g. wheels, etc.), so we need to tell setuptools about it.

Create a MANIFEST.in

The first step is to create a new file, MANIFEST.in, as a sibling to setup.py. This file lets us specify the files that should be included in our distributions (beyond the files that are included by default). You can read more about it in the Python Packaging User guide.

At it's simplest (which works for me most of the time), it just needs to specify that your package should include anything and everything under some directory. In our case, we can include everything under source/project/data like this:

recursive-include source/project/data *

That's it. You can, of course, have much more complex include/exclude specs in MANIFEST.in, but this will get you started.

Update setup.py

You also need to modify setup.py to make sure it will let you include package data. Fortunately, in the normal case, this is very simple:

setup(
    ...
    include_package_data=True,
    package_dir={"": "source"},
    ...
)

Now when you install your package from source or generate wheels for distribution, everything in the data directory will be included in your package.

Accessing the package data

Including the package data is only half of the battle, though. You still need some way to access the files from your program. This is where the importlib.resources module comes in. importlib.resources lets you (among other things) get paths to the directories and files in your package data. I won't go into great detail here, but here's how you could read the contents of our default_config.toml:

with importlib.resources.path('project', 'data') as data_path:
    default_config_path = data_path / "default_config.toml"
    contents = default_config_path.read_text()

The standard library docs linked above are excellent, so I'll leave it at that.

What did I get wrong or leave out?

There are much more sophisticated ways to use pkg_utils and package data, but I find that what I've described above seems to work well for most of what I need. If I got things wrong or left out important details, let me know!

The software heritage of K&R C

The mission statement of the Software Heritage is “… to collect, preserve, and share all software that is publicly available in source code form.”

What are the uses of the preserved source code that is collected? Lots of people visit preserved buildings, but very few people are interested in looking at source code.

One use-case is tracking the evolution of changes in developer usage of various programming language constructs. It is possible to use Github to track the adoption of language features introduced after 2008, when the company was founded, e.g., new language constructs in Java. Over longer time-scales, the Software Heritage, which has source code going back to the 1960s, is the only option.

One question that keeps cropping up when discussing the C Standard, is whether K&R C continues to be used. Technically, K&R C is the language defined by the book that introduced C to the world. Over time, differences between K&R C and the C Standard have fallen away, as compilers cease supporting particular K&R ways of doing things (as an option or otherwise).

These days, saying that code uses K&R C is taken to mean that it contains functions defined using the K&R style (see sentence 1818), e.g.,

writing:

int f(a, b)
int a;
float b;
{
/* declarations and statements */
}

rather than:

int f(int a, float b)
{
/* declarations and statements */
}

As well as the syntactic differences, there are semantic differences between the two styles of function definition, but these are not relevant here.

How much longer should the C Standard continue to support the K&R style of function definition?

The WG14 committee prides itself on not breaking existing code, or at least not lots of it. How much code is out there, being actively maintained, and containing K&R function definitions?

Members of the committee agree that they rarely encounter this K&R usage, and it would be useful to have some idea of the decline in use over time (with the intent of removing support in some future revision of the standard).

One way to estimate the evolution in the use/non-use of K&R style function definitions is to analyse the C source created in each year since the late 1970s.

The question is then: How representative is the Software Heritage C source, compared to all the C source currently being actively maintained?

The Software Heritage preserves publicly available source, plus the non-public, proprietary source forming the totality of the C currently being maintained. Does the public and non-public C source have similar characteristics, or are there application domains which are poorly represented in the publicly available source?

Embedded systems is a very large and broad application domain that is poorly represented in the publicly available C source. Embedded source tends to be heavily tied to the hardware on which it runs, and vendors tend to be paranoid about releasing internal details about their products.

The various embedded systems domains (e.g., 8, 16, 32, 64-bit processor) tend to be a world unto themselves, and I would not be surprised to find out that there are enclaves of K&R usage (perhaps because there is no pressure to change, or because the available tools are ancient).

At the moment, the Software Heritage don’t offer code search functionality. But then, the next opportunity for major changes to the C Standard is probably 5-years away (the deadline for new proposals on the current revision has passed); plenty of time to get to a position where usage data can be obtained 🙂

Unborking my ISSO comments system and making it more resilient

First, I apologise for not noticing that the comments had been broken for a while. This was entirely my fault and not fault of ISSO, which I’m still super happy with as a self-hosted comments system. So in this post I’m going to describe what went wrong, and also how I made the system a little more resilient at the same time. First, what did go wrong? My web server is using FreeBSD as its OS, with a bunch of software installed via FreeBSD’s ports system.

Streaming to Twitch and PeerTube simultaneously using nginx on Oracle cloud

Simulcasting RTMP using NGINX

I want people to be able to watch my Matrix and Rust live coding streams using free software, so I’d like to simulcast to PeerTube as well as Twitch.

This is possible using NGINX and its RTMP module. It does involve building NGINX from source, but I actually found that reasonably easy to do.

Why Oracle cloud?

I would never recommend using Oracle for anything, but they do provide up to two virtual machines in their cloud for free, and the one I am using has been consistently available with very good connectivity, in a London data centre since I set it up several months ago.

So, we are making our lives more difficult by trying to do this on Oracle Linux, which is a derivative of RHEL.

Building NGINX and its RTMP module on Oracle Linux

I ran these commands on my Oracle cloud instance (running Oracle Linux):

sudo yum install git pcre-devel openssl-devel
mkdir nginx
cd nginx
wget http://nginx.org/download/nginx-1.21.4.tar.gz
git clone https://github.com/arut/nginx-rtmp-module.git
cd nginx-1.21.4
./configure --add-module=../nginx-rtmp-module/
make
sudo make install

After all this NGINX was installed to /usr/local/nginx/.

Creating the NGINX config file for RTMP simulcasting

Next I edited the NGINX config file by typing:

sudo nano /usr/local/nginx/conf/nginx.conf

And pasted in this config at the bottom of the file:

rtmp {
    server {
        listen 1935;
        chunk_size 4096;
        application live {
            live on;
            record off;
            push rtmp://live.twitch.tv/app/live_INSERT_TWITCH_STREAM_KEY;
            push rtmp://diode.zone:1935/live/INSERT_PEERTUBE_STREAM_KEY;
        }
    }
}

Notice that you will need to get your Twitch stream key from Twitch -> Creator Dashboard -> Settings -> Stream, then Copy next to the Primary Stream Key.

To get a PeerTube stream ID, you will need to go to your PeerTube page and click Publish, then Go Live, choose your channel and choose Go Live. Note that if you want the streams to record and be available later, you have to create a new stream key each time you start a stream, and change it in nginx.conf.

If you use a different PeerTube server (I use diode.zone) then you’ll need to change the server name in the config file above too.

Make sure your config file is saved with the right URLs in it.

Opening ports

To send RTMP traffic to my server, I needed to open the right port to the Oracle cloud instance. That involved creating an ingress rule, and adding a firewall rule.

Creating an ingress rule

In the web interface, I went to the menu in the top left, clicked Compute, then Instances.

I clicked on my instance’s name, then I clicked on the name of the subnet in the details (on the right).

I clicked on Default security list for…, then Add Ingress Rules.

I made an ingress rule with Source Type=CIDR, Source CIDR=0.0.0.0/0, IP Protocol=TCP, Source Port Range=(blank, meaning all), Destination Port Range=1935

Adding a firewall rule

Then I ssh’d into the machine and ran these commands to create a firewall rule allowing the traffic:

sudo firewall-cmd --zone=public --permanent --add-port=1935/tcp
sudo firewall-cmd --reload

Stop and Start NGINX

After creating the config file and opening the right port, I needed to start NGINX.

Every time I change the config file, I need to restart it.

If it’s already running, I stop it with:

sudo /usr/local/nginx/sbin/nginx -s stop

and then I start it up again with

sudo /usr/local/nginx/sbin/nginx

I can check whether it’s happy by looking at the log files, for example to see any errors:

less /usr/local/nginx/logs/error.log

Starting the stream

Now I go into OBS and go to File -> Settings -> Stream and choose the type as Custom, and the Server as rtmp://1.1.1.1/live. (But instead of 1.1.1.1 I put the public IP address of my instance, which I found by clicking the name of the instance in the Oracle cloud management console.)

Open source: the goody bag for software infrastructure

For 70 years there has been a continuing discovery of larger new ecosystems for new software to grow into, as well as many small ones. Before Open source became widely available, the software infrastructure (e.g., compilers, editors and libraries of algorithms) for these ecosystems had to be written by the pioneer developers who happened to find themselves in an unoccupied land.

Ecosystems may be hardware platforms (e.g., mainframes, minicomputers, microcomputers and mobile phones), software platforms (e.g., Microsoft Windows, and Android), or application domains (e.g., accounting and astronomy)

There are always a few developers building some infrastructure project out of interest, e.g., writing a compiler for their own or another language, or implementing an editor that suites them. When these projects are released, they have to compete against the established inhabitants of an ecosystem, along with other newly released software clamouring for attention.

New ecosystems have limited established software infrastructure, and may not yet have attracted many developers to work within them. In such ‘virgin’ ecosystems, something new and different faces less competition, giving it a higher probability of thriving and becoming established.

Building from scratch is time-consuming and expensive. Adapting existing software systems speeds things up and reduces costs; adaptation also has the benefit of significantly reducing the startup costs when recruiting developers, i.e., making it possible for experienced people to use the skills acquired while working in other ecosystems. By its general availability, Open source creates competition capable of reducing the likelihood that some newly created infrastructure software will become established in a ‘virgin’ ecosystem.

Open source not only reduces startup costs for those needing infrastructure for a new ecosystem, it also reduces ongoing maintenance costs (by spreading them over multiple ecosystems), and developer costs (by reducing the need to learn something different, which happened to be created by developers who built from scratch).

Some people will complain that Open source is reducing diversity (where diversity is viewed as unconditionally providing benefits). I would claim that reducing diversity in this case is a benefit. Inventing new ways of doing things based on the whims of those doing the invention is a vanity project. I have nothing against people investing their own resources on their own vanity projects, but let’s not pretend that the diversity generated by such projects is likely to provide benefits to others.

By providing the components needed to plug together a functioning infrastructure, Open source reduces the cost of ecosystem ‘invasion’ by software. The resources which might have been invested building infrastructure components can be directed to building higher level functionality.

A Day At The Races – baron m.

Halloo Sir R-----! Pray come join me and partake of a glass of this rather excellent potation!

Might I again tempt you with a wager?

Splendid!

I have in mind a game that always reminds me of my victory upon the turf at Newmarket. Ordinarily I would not participate in a public sporting event such as this since I am at heart a modest man and derive no pleasure in demonstrating my substantial superiority over my fellows.

New game: Tron – frantic multiplayer retro action

My newest game is out now on Smolpxl Games – Tron:

Pixellated lines fight each other to stay alive

Play at smolpxl.gitlab.io/tron.

It’s a frantic multiplayer retro pixellated thingy playable in your browser. Try to stay alive longer than everyone else!

This version allows many players (up to 16 if you can manage it), and is quite pure in its implementation.

There are bots to play against, and you can gather your friends around a keyboard to play together.

Part of the motivation for writing this game was to test my new smolpxl-remote remote-play system, but this is not enabled yet, so watch this space…

I love playing games with other people – preferably at least 3 other people. In theory you could have 8 players around a keyboard playing this – send me a picture if you try!

One feature I worked on in the Smolpxl library for this game: saving configuration to local storage (and asking permission to do so). I ended up with a very ugly hack to do this, so a bit more work is needed before I merge it into the library.

Running Jest tests in VS Code with custom environment variables

Currently the most popular Jest test runner extension for VS Code is vscode-jest by Orta. For most common setups, this extension works without any configuration needed to VS Code. In my case, though, I needed to enable Jest's support for ECMAScript modules. The Jest documentation lists a few ways to do this, and I decided to use the the method that involves setting an environment variable.

Because I needed to set this environment variable, vscode-jest's default behavior didn't work, and I ended up needing to create a run configuration. This was not particularly complicated, but it was complicated enough that I thought I should capture the knowledge here.

Configuring the Jest command

First you need to configure the Jest command in your settings. To do this you can use the extension's "Setup Extension" command. From the command palette, run "Jest: Setup Extension" (or possibly "Jest: Setup Extension (beta)" if it's still in beta). Choose "Setup Jest Command" in the dropdown this produces.

It will ask if you can run Jest tests from the terminal; choose "yes". When it then asks for the Jest command line, enter "node_modules/.bin/jest". (Of course, if you use something else, enter that!)

This will add an entry like this to your settings.json:

"jest.jestCommandLine": "node_modules/.bin/jest"

Creating the launch configuration

You'll then return to the setup wizard's dropdown list. This time select "Setup Jest Debug Config", and then select "Generate". This will add a run configuration to your launch.json. Now you can select "Exit" from the wizard.

Now that you have the launch configuration, you need to edit it to add the environment variable. Add this to the launch configuration inside launch.json:

"env": {
    "NODE_OPTIONS": "--experimental-vm-modules"
}

You should end up with a configuration that looks something like this:

{
    "configurations": [
        {
            "type": "node",
            "name": "vscode-jest-tests",
            "request": "launch",
            "console": "integratedTerminal",
            "internalConsoleOptions": "neverOpen",
            "disableOptimisticBPs": true,
            "program": "${workspaceFolder}/node_modules/.bin/jest",
            "cwd": "${workspaceFolder}",
            "args": [
                "--runInBand",
                "--watchAll=false"
            ],
            "env": {
                "NODE_OPTIONS": "--experimental-vm-modules"
            }
        }
    ]
}

With this in place, you should be able to run and debug Jest tests from the test tool or directly from the test file.

Preventing Virgin Media hijacking my DNS

Yesterday I learned that Virgin Media is inserting itself into some of my DNS requests. Much as I am not a fan of how powerful Cloudflare are, if they are telling the truth about their DNS, then it’s safe, so I followed their instructions on how to use their DNS and then removed the default DNS and hopefully my Internet will work now.

From the serverfault answer by lauc.exon.nod:

nmcli con mod "Wired connection 1" ipv4.dns "1.1.1.1 1.0.0.1"
nmcli con mod "Wired connection 1" ipv4.ignore-auto-dns yes
nmcli con down "Wired connection 1"
nmcli con up "Wired connection 1"

A company is not a tree: an alternative map

It seems that everyone dislikes hierarchy in organizations. Even the people at the top of the hierarchy seem to want to get away from the idea. But… the moment we start talking about organizations everyone starts talking about who’s at the top, the CEO, and who reports to who. Try to draw it out and you end up with some sort of inverted tree.

Part of the problem is that we all want, indeed need, structure. Saying “there is a bunch of people” isn’t enough. We need some way of understanding who is who and where they all fit in. Perhaps we cling to hierarchy because we lack a better model to conceptualise our organizations and who they fit together.

Programmers and business designers aren’t the only ones who want to think of things in a neat tree like hierarchies. I was recently introduced to Christopher Alexander’s essay “A city is not a tree” in which he rails against the same idea. Living in London and I while I could imagine constructing a hierarchy on some criteria I immediately know it would be wrong. It would not capture the true nature of London. Neither Oxford Street or Threadneedle Street are at the top, they would be contenders but in different way. Each part place places multiple roles. There isn’t one centre, there are many centres.

Maps help use make sense of places like London but even here we use different maps with different conventions depending on what we want to do: the Tube map is very different to a visitors map which is different to a map of boroughs, we use different maps for different things. And maps shape our thinking and action – consider the Google map of central London with selective information trying to be useful but also trying to s ell things.

Manager at the centre of the solar system

We need maps of our organizations to understand them but in drawing the map we shape our thinking. If we want to move away from hierarchical thinking we need another way of mapping our organizations.

So let me suggest a different way of thinking about an organization, a way I find useful, a way I briefly mention in my “Reawakening Agile with OKRs” presentation: concentric circles – think of it as our solar system with plants (teams) orbiting the sun (leadership.)

Rather than think of your supreme leader at the top of an organization with everyone else below them – an idea that just shouts “inferior” – think of the supreme leader as the centre of the organization. After all, everyone in the company has a relationship with that person even if relationship with them in the same way that every asteroid in the solar system has a relationship with the sun.

The sun, the leader, exerts a force on everything, everyone, else. Some people are close to the centre and close to the leader – they feel a lot of the leaders force. Others are far away, some are so remote the leader struggles to exert any influence.

And while I say “leader” it might be better to think about the leadership team. Close in there isn’t just one leader, even here leadership is split between a CEO, CFO, and even the board. Nobody has total authority, everyone needs to work with others.

You might also add on the communication paths, some teams communicate with other teams a lot, and some teams hardly at all.

Like the solar system there are alternative centres. Earth has but one Moon, that is influenced by the sun but Earth is a far bigger influence. Jupiter has dozens of moons and exerts a lot more influence on its moons than the sun does. Thats not unlike the way some teams and leaders operate.

These satellites influence each other too – maybe not something astronomers see much but some teams follow similar orbits to others and can influence them. Imagine Mars came close enough to Earth at times to influence the seas the way the moon did – even if they only occurred occasionally it would be meaningful. In a company some teams influence others, one team uses the work of another, or they serve the same customers, or the can disrupt the other.

If we are to navigate our organizations without repeatedly referring to tops and bottoms, ups and down, superiors and inferiors, then we need to start changing the models we use to guide us.

This view might also answer another question I raised a few years ago. In Programmers Rorschach Test I noted that organizational charts look exactly like the structure charts I was taught at University. These were an alternative to flow-charts for structured programming in Pascal like languages.

Think about that: organizational design looks exactly like structured programming: Conway’s Law again.

So what does Object Oriented programming look like? Perhaps the solar system provides an answer: lots of independent objects following their own paths but exerting forces on others.

Add asteroids, comets and dwarf planets to planets and moons and you have plenty of ideas to model with.


Subscribe to my blog newsletter and download Continuous Digital for free

The post A company is not a tree: an alternative map appeared first on Allan Kelly.

Visual Lint 8.0.6.347 – a Clang download here, a CppCheck download there….

Visual Lint 8.0.6.347 has now been released.

The most notable changes in this build relate to configuration - in particular, the user interface now embeds links to the installers for open source analysis tools, which should make configuring Visual Lint much easier:

Configuration Wizard pageThe user interface now includes download links for selected open source analysis tools

This change was prompted by the fact that the Clang-Tidy installers are quite hard to find on the LLVM Download page, but is also applicable to (for example) CppCheck.

Visual Lint 8.0.6.347 is a recommended maintenance update for Visual Lint 8.0, and includes the following changes:

  • Added direct download links for open source analysis tools such as Clang-Tidy and CppCheck to the Configuration Wizard, Options Dialog and Active Analysis Tool Dialog.

  • When the installation folder of an analysis tool is set in the Configuration Wizard for a particular IDE/project type, the path will now be used as the default for all project types. It can of course still be overridden on a project type by project type (or solution/workspace/project by solution/workspace/project) basis as required.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2022.lnt to support the public release of Visual Studio 2022 v17.0.0.

  • Updated the Clang-Tidy message database to reflect changes in Clang-Tidy 13.0.0.

  • Corrected the "Status" text in the "Active Analysis Tool" dialog.

Download Visual Lint 8.0.6.347

Come and see Find My Tea pitch to SyncNorwich on 23 November 2021

 


What:
Startup Pitches #12 - Find My Tea, Greenr, Scoop & Yakbit

When: Tuesday, November 23, 2021 @ 6:00pm

Where: Access Creative College, 114 Magdalen Street, Norwich

How much: Free

RSVP: https://www.meetup.com/syncnorwich/events/281757681/

Agenda

6.00pm - Arrivals & Networking with Pizza & Beer
6.15pm - Intro
6.30pm - PITCH 1: Find My Tea
6.40pm - PITCH 2: Greenr
6.50pm - PITCH 3: Scoop
7.00pm - PITCH 4: Yakbit
7.10pm - Q&A
7.30pm - Vote for Best Pitch
7.45pm - Networking & Quick Drink
8.00pm - Close

PITCH 1: Find My Tea

Tea made simple

Whether you're looking for loose leaves or tea bags to take home, or a cafe to unwind in with your favourite blend – it's easy. Simply type in your location or postcode, and you'll soon be able to find the shop or cafe serving the tea that touches your tastebuds. Simple.

https://findmytea.co.uk/
https://twitter.com/FindMyTea

Speaker Paul Grenyer is a husband, father, software engineer, metaller, Paul has been writing software for over 35 years and professionally for more than 20. In that time he has worked for and in all sorts of companies from two man startups to world famous investment banks and insurance companies. He has built and run three limited companies, none of which made him a millionaire and two of which threatened his sanity on more than one occasion.

https://www.linkedin.com/in/pgrenyer/
https://twitter.com/pjgrenyer

 

An Exception wrapper suitable for a RESTful API

User Story

As a third line support engineer

I want to be able to go to the server class that throws an exception reported by a client

So that I do not need to look for the stack trace in the server logs

Example

Client code

if (responseCode != 200) {
throw new TaskException(
"Error occurred while processing the scan response: " +
"response code: " + responseCode +
" response body: " + responseContent.getResponseBody());
}

Server code


if (null != header && header.startsWith(BEARER)) {
String token = header.substring(BEARER.length()).trim();
try {
final Jws jws = Jwts.parser().setSigningKeyResolver(jwtPublicKeyResolver)
.setAllowedClockSkewSeconds(3)
.parseClaimsJws(token);
} catch (JwtException ex) {
String errorMessage = "Invalid JWT token. ";
setError(httpServletResponse, errorMessage + ex.getMessage());
return;
}
}

This results in the following being reported by the second level support agent monitoring the client logs:

[Error occurred while processing the scan response : response code: 401 response body: Invalid JWT token. Error accessing publickey Api]:

What we, as Third Line Support, want is to know which server class throws the exception, ideally without grepping the code base or opening the server logs.

A better Exception message would be:

[Problem with scan response: status code: 401, body: com.corp.server.validation.JwtValidator.validate() line 72: JWT token Exception: Error accessing Public Key API]

This is the motivation for the StackAwareException, a wrapper exception which adds the class, method and line number of the first element of the wrapped exception's stack trace.

See https://github.com/timp/StackAwareException

Where are we with models of human learning?

Learning is an integral part of writing software. What have psychologists figured out about the characteristics of human learning?

A study of memory, published in 1885, kicked off the start of modern psychology research. At the start of the 1900s, learning research was still closely tied to the study of the characteristics of what we now call working memory, e.g., measuring the time taken for subjects to correctly recall sequences of digits, nonsense syllables, words and prose. By the 1930s, learning was a distinct subject in its own right.

What is now known as the power law of learning was first proposed in 1926. Wikipedia is right to use the phrase power law of practice, since it is some measure of practice that appears in the power law of learning equation: T=a+b*P^{-c}, where: T is the time taken to do the task,P is some measure of practice (such as the number of times the subject has performed the task), and a, b, and c are constants fitted to the data.

For the next 70 years some form of power law did a good job of fitting the learning data produced by researchers. Then in 1997 a paper pointed out that researchers were fitting aggregate data (i.e., one equation fitted to all subject data), and that an exponential equation was a better fit to individual subject response times: T=a+b*e^{-cP}. The power law appeared to be the result of aggregating the exponential response performance of multiple subjects; oops.

What is the situation today, 25 years later? Do the subsystems of our brains produce a power law or exponential improvement in performance, with practice?

The problem with answering this question is that both equations can fit the available data quite well, with one being a technically better fit than the other for different datasets. The big difference between the two equations is in their tails, however, it is costly and time-consuming to obtain enough data to distinguish between them in this region.

When discussing learning in my evidence-based software engineering book, I saw no compelling reason to run counter to the widely cited power law, but I did tell readers about the exponential fit issue.

Studies of learnings have tended to use simple tasks; subjects are usually only available for a short time, and many task repetitions are needed to model the impact of learning. Simple tasks tend to be dominated by one primary activity, which means that subjects can focus their learning on this one activity.

Complicated tasks involve many activities, each potentially providing distinct learning opportunities. Which activities will a subject focus on improving, will the performance on one activity improve faster than others, will the approach chosen for one activity limit the performance on a second activity?

For a complicated task, the change in performance with amount of practice could be a lot more complicated than a single power law/exponential equation, e.g., there may be multiple equations with each associated with one or more activities.

In the previous paragraph, I was careful to say “could be a lot more complicated”. This is because the few datasets of organizational learning show a power law performance improvement, e.g., from 1936 we have the most cited study Factors Affecting the Cost of Airplanes, and the less well known but more interesting Liberty shipbuilding from the 1940s.

If the performance of something involving multiple people performing many distinct activities follows a power law improvement with practice, then the performance of an individual carrying out a complicated task might follow a simple equation; perhaps the combined form of many distinct simple learning activities is a simple equation.

Researchers are now proposing more complicated models of learning, along with fitting them to existing learning datasets.

Which equation should software developers use to model the learning process?

I continue to use a power law. The mathematics tend to be straight-forward, and it often gives an answer that is good enough (because the data fitted contains lots of variance). If it turned out that an exponential would be easier to work with, I would be happy to switch. Unless there is a lot of data in the tail, the difference between power law/exponent is usually not worth worrying about.

There are situations where I have failed to successfully add a learning (power law) component to a model. Was this because there was no learning present, or was the learning not well-fitted by a power law? I don’t know, and I cannot think of an alternative equation that might work, for these cases.

How large an impact does social conformity have on estimates?

People experience social pressure to conform to group norms. How big an impact might social pressure have on a developer’s estimate of the effort needed to implement some functionality?

If a manager suggests that the effort likely to be required is large/small, I would expect a developer to respond accordingly (even if the manager is thought to be incompetent; people like to keep their boss happy). Of course, customer opinions are also likely to have an impact, but what about fellow team members, or even the receptionist. Until somebody runs the experiments, we are going to have to do with non-software related tasks.

A study by Molleman, Kurversa, and van den Bos asked subjects (102 workers on Mechanical Turk) to estimate the number of animals in an image (which contained between 50 and 100 ants, flamingos, bees, cranes or crickets). Subjects were given 30 seconds to respond, and after typing their answer they were told that “another participant had estimated X“, and given 45 seconds to give a second estimate. The ‘social pressure’ estimate, X, was chosen to be around 15-25% larger/smaller than the estimate given (values from a previous experiment were randomly selected).

The plot below shows the number of second estimates where there was a given percentage change between the first and second estimates, red line is a loess fit; the formula used is {secondEstimate-firstEstimate}/{SocialEstimate-firstEstimate} (code+data):

Number of second estimates having a given change in the first estimate towards social estimate.

Around 25% of second estimates were unchanged, and 2% were changed to equal the social estimate. In two cases the second estimate was less than the first, and in eleven cases it was larger than the social estimate. Both the mean and median for shift towards the social estimate were just over 30% of the difference between the first estimate and the social estimate.

As with previous estimating studies, a few round numbers were often chosen. I was interested in finding out what impact the use of a round number value for the first estimate, or the social estimate, might have on the change in estimated value. The best regression model I could find showed that if the first estimate was exactly divisible by 5 (or 10), then the second estimate was likely to be around 5% larger. In fact divisible-by-5 was the only variable that had any predictive power.

My initial hypothesis was that the act of choosing a round numbers is an expression of uncertainty, and that this uncertainty increases the impact of the social estimate (when making the second estimate). An analysis of later experiments suggested that this pattern was illusionary (see below).

Modelling estimate values, rather than their differences, the equation: secondEstimate approx firstEstimate^{0.6}*SocialEstimate^{0.3} explains nearly all the variance present in the data.

Two weeks after the first experiment, all 102 subjects were asked to repeat the experiment (they each saw the same images, in the same order, and social estimates as in the first study); 69 subjects participated. Nine months after the first experiment, subjects were asked to repeat the experiment again; 47 subjects participated, again with each subject seeing the same images in the same order, and social estimates. Thirty-five subjects participated in all three experiments.

To what extent were subjects consistently influenced by the social estimate, across three identical sessions? The Pearson correlation coefficient between both the first/second experiment, and the first/third experiment, was around 0.6.

The impact of round numbers was completely different, i.e., no impact on the second, and a -7% impact on the third (i.e., a reduced change). So much for my initial hypothesis.

The exponents in the above equation did not change much for the data from the second and third reruns of the experiment.

The variability in the social estimates used in these experiments, involving image contents, differs from software estimates in that they were only 12-25% different from the first estimate. Software estimates often differ by significantly larger amounts (in fact, a 12% difference would probably be taken as agreement).

With some teams, people meet to thrash out a team estimate. Data is sometimes available on the final estimate, but data on the starting values is very hard to come by. Pointers to experiments where social estimates are significantly different (i.e., greater than 50%) from the ones given by subjects welcome.

A Kutta Above The Rest – a.k.

We have recently been looking at ordinary differential equations, or ODEs, which relate the derivatives of one variable with respect to another to them with a function so that we cannot solve them with plain integration. Whilst there are a number of tricks for solving such equations if they have very specific forms, we typically have to resort to approximation algorithms such as the Euler method, with first order errors, and the midpoint method, with second order errors.
In this post we shall see that these are both examples of a general class of algorithms that can be accurate to still greater orders of magnitude.

What are the first steps in setting OKRs?

“What do you see as the first steps in setting OKRs?”

After delivering my “Reawakening Agile with OKRs” presentation to an internal company group the other day and got this question as a follow up afterwards. As I thought it would be worth sharing my reply with more readers, which is also an opportunity to expand my thinking.

First we need to make some assumptions and decide policies.

I’m going to assume that the team know what OKRs are, why they are being introduced and what is expected of them in setting OKRs. So, if this assumption does not hold true then before you set the OKRs establish some shared understanding on these points. Perhaps get an introduction to OKRs for the team. (I’ve started work on another video tutorial series, an introduction to OKRs and agile.) Next get some guidance from those suggesting the team use OKRs on what they expect.

I’m sorry to say I hear of plenty of cases were these things don’t happen. Teams are told: “thou shalt use OKRs.”

It would also help if those suggesting OKRs have spelt out what they see as success (100% of OKRs complete? 70%? Or, as I prefer: benefit delivered.) But you know what? If you don’t know this you can clarify it later, nice to know in advance but in a pinch, not essential.

Next I suggest Think Team – I’m skeptical about individual OKRs so don’t set OKRs on anything smaller than a team level. While it might help if the “next level up” set OKRs first if the team start first then the team clearly own the OKRs. So, while there are advantages to knowing the priorities higher up there are also advantages to taking the initiative.

If you want to set some kind of individual objectives then my advice is: wait while you learn. Get some experience at the team level with OKRs before thinking about individual goals. Or perhaps, for the first two quarters make everyone’s individual goal “learn how to work with OKRs by working with OKRs.”

It will also help if you have some idea of how your OKRs are going to line up against any backlog you have. Are the OKRs reverse engineered from the backlog? Or do the OKRs have priority over the backlog? Or, as I suggest, use OKRs as a story generating machine instead of having a backlog?

Similarly, if you team needs to “keep the lights on” and do “business as usual” stuff in addition to OKRs you need to know in advance. That will soak up capacity. So how do you reflect that in your OKRs? – in Succeeding with OKRs in Agile I suggest a OKR-Zero for this type of stuff.

Now to set OKRs I suggest at least two meetings – and preferably not too many more. The first meeting is a drafting meeting. You might think of it as a big brainstorm. Get ideas out on the table, talk about priorities. Aim to get a rough draft of some candidate OKRs.

Before that first drafting meeting someone – Team Leader, Manager, Scrum Master, Agile Coach, whoever – needs to confirm what the timeline is. Are the OKRs to run over 13 weeks? – or is it Christmas so this a 15 week quarter? Or maybe you only have 10 weeks this time. The deadlines are important. Don’t plan OKRs without knowing when the first and last days of the cycle are.

It helps if team members have given a little pre-thought to what they would like to see in the OKRs. Now I don’t want a lot of pre-work. And I especially don’t want lots of planning because that a) detracts from they current cycle and b) potentially limits ambition when setting the OKRs. Still a little forethought – think of it like writing your Christmas list.

This suggestion is particularly important to the Product Owner. Since team team are aiming to delivery benefits to others (customers, users, stakeholder, whoever) it is natural that the Product Owner takes a lead in drafting meetings. Whatever title you give this person this is the person who is charged with listening to customer requests, understanding non-customer users, liaising with stakeholders and understanding the business/product strategy and knowing what would be beneficial to who. So it makes sense for this person to have plenty of ideas on what to do.

In the run up to OKR setting is Product Owners need to bring all their homework together and decide what outcomes they would like. The Product Owner needs to present this thinking to the team in OKRs setting and work with the other team members to craft OKRs which reflect those asks. Most importantly of all, they have to understand the implications when some items don’t make the cut.

Thus the Product Owner will walk into the OKR planning meeting with the longest Christmas wish-list of any team member. But they will not get everything on that list, far from it.

Once you have your draft OKRs take a break. At least an overnight break, or maybe a few days.

Do any more homework that is needed (e.g. check requests with customers, show draft to partner teams and managers for feedback, check availability or timelines of people or equipment that might expect to need, etc.)

The second meeting is there to firm up the draft. Ideally after some reflection and some homework everything in the draft looks good, all you need to do is tighten it up and declare it final.

But there is every chance your draft contained six desirable objectives and it needs some reflection and some homework before you can reduce it to three. It may also be that that homework turns up a problem, a priority that had not been appreciated or a block that wasn’t foreseen. In which case you need to revisit the draft.

Setting OKRs inevitably means making choice about what will be done and what will not be done. I’ve heard of teams who have “do not do” lists in parallel with OKRs. This is because OKRs implement strategy and if the strategy is lacking or unclear then OKRs will make that clear, and hopefully seed a conversation.

Enough for now. I hope you found that interesting. If anyone out there has any more questions about OKRs please let me know and I’ll see if I can answer them here.

Subscribe and download Continuous Digital for free


Child at steps in image Jukan Tateisi in Unsplash.

The post What are the first steps in setting OKRs? appeared first on Allan Kelly.

What are the first steps in setting OKRs?

“What do you see as the first steps in setting OKRs?”

After delivering my “Reawakening Agile with OKRs” presentation to an internal company group the other day and got this question as a follow up afterwards. As I thought it would be worth sharing my reply with more readers, which is also an opportunity to expand my thinking.

First we need to make some assumptions and decide policies.

I’m going to assume that the team know what OKRs are, why they are being introduced and what is expected of them in setting OKRs. So, if this assumption does not hold true then before you set the OKRs establish some shared understanding on these points. Perhaps get an introduction to OKRs for the team. (I’ve started work on another video tutorial series, an introduction to OKRs and agile.) Next get some guidance from those suggesting the team use OKRs on what they expect.

I’m sorry to say I hear of plenty of cases were these things don’t happen. Teams are told: “thou shalt use OKRs.”

It would also help if those suggesting OKRs have spelt out what they see as success (100% of OKRs complete? 70%? Or, as I prefer: benefit delivered.) But you know what? If you don’t know this you can clarify it later, nice to know in advance but in a pinch, not essential.

Next I suggest Think Team – I’m skeptical about individual OKRs so don’t set OKRs on anything smaller than a team level. While it might help if the “next level up” set OKRs first if the team start first then the team clearly own the OKRs. So, while there are advantages to knowing the priorities higher up there are also advantages to taking the initiative.

If you want to set some kind of individual objectives then my advice is: wait while you learn. Get some experience at the team level with OKRs before thinking about individual goals. Or perhaps, for the first two quarters make everyone’s individual goal “learn how to work with OKRs by working with OKRs.”

It will also help if you have some idea of how your OKRs are going to line up against any backlog you have. Are the OKRs reverse engineered from the backlog? Or do the OKRs have priority over the backlog? Or, as I suggest, use OKRs as a story generating machine instead of having a backlog?

Similarly, if you team needs to “keep the lights on” and do “business as usual” stuff in addition to OKRs you need to know in advance. That will soak up capacity. So how do you reflect that in your OKRs? – in Succeeding with OKRs in Agile I suggest a OKR-Zero for this type of stuff.

Now to set OKRs I suggest at least two meetings – and preferably not too many more. The first meeting is a drafting meeting. You might think of it as a big brainstorm. Get ideas out on the table, talk about priorities. Aim to get a rough draft of some candidate OKRs.

Before that first drafting meeting someone – Team Leader, Manager, Scrum Master, Agile Coach, whoever – needs to confirm what the timeline is. Are the OKRs to run over 13 weeks? – or is it Christmas so this a 15 week quarter? Or maybe you only have 10 weeks this time. The deadlines are important. Don’t plan OKRs without knowing when the first and last days of the cycle are.

It helps if team members have given a little pre-thought to what they would like to see in the OKRs. Now I don’t want a lot of pre-work. And I especially don’t want lots of planning because that a) detracts from they current cycle and b) potentially limits ambition when setting the OKRs. Still a little forethought – think of it like writing your Christmas list.

This suggestion is particularly important to the Product Owner. Since team team are aiming to delivery benefits to others (customers, users, stakeholder, whoever) it is natural that the Product Owner takes a lead in drafting meetings. Whatever title you give this person this is the person who is charged with listening to customer requests, understanding non-customer users, liaising with stakeholders and understanding the business/product strategy and knowing what would be beneficial to who. So it makes sense for this person to have plenty of ideas on what to do.

In the run up to OKR setting is Product Owners need to bring all their homework together and decide what outcomes they would like. The Product Owner needs to present this thinking to the team in OKRs setting and work with the other team members to craft OKRs which reflect those asks. Most importantly of all, they have to understand the implications when some items don’t make the cut.

Thus the Product Owner will walk into the OKR planning meeting with the longest Christmas wish-list of any team member. But they will not get everything on that list, far from it.

Once you have your draft OKRs take a break. At least an overnight break, or maybe a few days.

Do any more homework that is needed (e.g. check requests with customers, show draft to partner teams and managers for feedback, check availability or timelines of people or equipment that might expect to need, etc.)

The second meeting is there to firm up the draft. Ideally after some reflection and some homework everything in the draft looks good, all you need to do is tighten it up and declare it final.

But there is every chance your draft contained six desirable objectives and it needs some reflection and some homework before you can reduce it to three. It may also be that that homework turns up a problem, a priority that had not been appreciated or a block that wasn’t foreseen. In which case you need to revisit the draft.

Setting OKRs inevitably means making choice about what will be done and what will not be done. I’ve heard of teams who have “do not do” lists in parallel with OKRs. This is because OKRs implement strategy and if the strategy is lacking or unclear then OKRs will make that clear, and hopefully seed a conversation.

Enough for now. I hope you found that interesting. If anyone out there has any more questions about OKRs please let me know and I’ll see if I can answer them here.

Subscribe and download Continuous Digital for free


Child at steps in image Jukan Tateisi in Unsplash.

The post What are the first steps in setting OKRs? appeared first on Allan Kelly.