Best practices considered harmfull

NoBestPractice-2018-06-20-16-53.jpg

I’ve long worried about “Best Practices”. Sure I usually play along at the time but lurking in the back of my mind, waiting for a suitable opportunity are two questions:

  • Who decided this was best practice?
  • Who says this practice can’t be bettered?

I was once told by someone from the oil industry that it was common for contracts to specify “best practice” should be used. But seldom was the actual practice specified. Instead each party to the contract would interpret best practice as they wished, until something went wrong. At that point, after an accident, after money was lost they would go to court and a judge would decide what was best practice.

Sure practice X might be the best know way of doing things at the moment but how much better could it be? By declaring something “best practice” you can be self limiting and potentially preventing innovation.

Now a piece in MIT Sloan Management Review (Why Best Practices Often Fall Short, Jérôme Barthélemy, February 2018) adds to the debate and highlights a few more problems.

Just for openers, sometimes people mistakenly identify the practice creating the benefits. Apparently some people looked at Pixar animation and decided that having rest rooms (toilets to us English speakers) in the centre of an office floor enhances creativity. They might do, but there is so much else happening at Pixar that moving all the toilets in your organization will probably make no difference at all.

But it is worse than that.

Adopting best practice from elsewhere does not mean it will be best practice in your environment but adopting that “best practice” will be disruptive. Think of all the money you will need to spend relocating the toilets, all the people who will be upset by a desk move they don’t want, all the lost productivity while the work is going on.

The author suggests that in some cases that disruption costs are so high the “best practice” will never cover the costs of the change. Organizations are better shunning the best practice and carrying on as they are. (ERP anyone?)

It gets worse.

There is risk in those best practices. Risk that they will cost more, risk that they won’t be implemented correctly and risk that they will backfire. What was best practice at one organization might not be best practice in yours. (Which might imply you need even more change, even more disruption at even more cost.)

In fact, some best practices – like stock options for executives – can go horrendously wrong and induce behaviours you most definitely don’t want.

So what is a poor company to do?

Well, the author suggests something that does work: copying good practices. Not best but “just OK”. That works. Copy the mundane stuff, the proven stuff. The costs and risks of a big change are avoided. (This sounds a bit like In Search of Mediocracy.)

In my world that means you want to be getting better at doing Agile instead of trying to leapfrog Agile and move to DevOps in one bound.

The author also suggests that where your competitive advantage is concerned keep your cards close to your chest. Do thinks yourself. Work out what your best practice is, work out how you can improve yourself.

I’ve long argued that I want teams to learn and learn for themselves rather than have change done to them. But I also want teams to steal. When they see other teams – at home or elsewhere – doing good things they should steal practices. The important thing from my point of view is for the teams to decide for themselves.

Sign-up to receive these posts by e-mail and free eBook of Xanpan

The post Best practices considered harmfull appeared first on Allan Kelly Associates.

Learn WordPress & build a website in ONE day


When: 28 June 2018, 9am to 4.45pm

Where: The Kings Centre, Kings Street, Norwich, NR1 1PH

How much: £150

RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/250241910/

WordPress is the world’s best and most popular website builder and this hands-on course takes you through from the basics, including installation and set up, to cover all the most useful features and tools WordPress offers. Whether you already have a site and want to manage it properly or are starting completely from scratch, this is the course for you.

You will learn to

  • set up and run a great website of your own
  • add content, images and videos
  • add structure and navigation menus
  • apply an attractive design using easy templates
  • make the site search engine friendly
  • add contact forms, maps and take payments
  • add social networking and track visitorslearn to add all the features and functionality you need to run and develop a fantastic website
  • and much, much more…


How the course works


  • Please bring your own laptop: PC, Mac or Chromebook, any is fine. Or you can hire a laptop for the day here.
  • WiFi and power sockets are provided
  • No experience is needed – WordPress is incredibly easy to pick up and you will be free to go at your own pace throughout the day.
  • All training materials will be provided after the course, so there need be no fear of “falling behind”.This is an intensive course and assumes a reasonable working knowledge of using computers and the internet, even if you have little or no prior knowledge of WordPress. If you are comfortable with using email, copy/paste, saving files/folders and navigating the internet, you should be fine! (see more advice in our FAQ here)
  • After the course you are welcome to stay around for further discussion with your trainer Toby and with each other, about WordPress, about your website and about your business.
  • After the course, you will be sent all the course materials and clear instructions for setting up your site on its own domain name (old or new). You will have a year of free hosting, after which time it is from just £8/mo for unlimited space and bandwidth.


More details here (https://wpcourses.co.uk/wordpress-training-courses/?gclid=EAIaIQobChMI0bSL8tTR1wIVSjobCh2A9gVdEAAYASAAEgJLnfD_BwE).

Number of parameters vs. accessing globals

I spend a lot of time looking at software engineering data, asking, what is the story here?

In a previous post I suggested that the distribution of the number of functions defined to have a given number of parameters, might be a signature of developer beliefs about the relative cost of parameter passing vs accessing globals.

Looking at the data that Iran Rodrigues Gonzaga Junior made available (good man), as part of his thesis Empirical Studies on Fine-Grained Feature Dependencies, I saw it contained information about the number of parameters in a function definition and whether functions accessed a global (Gonzaga’s research question is in another direction; I am always repurposing data).

Are functions that access globals, defined with fewer parameters, compared to those that do not contain any such access? The plot below shows a count of the number of functions defined to have a given number of parameters, for four systems written in C; the solid lines are functions that did not access globals, the dashed lines are functions that accessed globals (code+data).

Number of functions defined to have a given number of parameters; four systems, written in C

Over all 50 projects measured, functions that don’t access globals are defined, on average, to have an extra 0.7 parameters (the fitted Poisson regression models are better than a poke in the eye {i.e., the distribution is not really Poisson}, it’s more informative to look at the plotted data).

There is a lot of variation between projects (I picked these four because they were the larger projects and showed variation in behaviors). While the shape of the distributions varies a lot, there is always a noticeable difference in the mean.

Is this difference between projects a difference in developer beliefs, a difference in application requirements, a difference in developer coding habits (and parameter usage is a side effect; are there really that many getters and setters)?

I was hoping for a simple answer, and could not find one. Since I am writing a book and not researching individual issues in detail, it’s time to move on.

Ideas welcome.

Further On Natural Analogarithms – student

My fellow students and I have of late been thinking upon an equivalence between the roots of rational numbers and an infinite dimensional rational vector space, which we have named -space, that we discovered whilst defining analogues of logarithms that were expressed purely in terms of rationals.
We were particularly intrigued by the possibility of defining functions of such numbers by applying linear algebra operations to their associated vectors, which we began with a brief consideration of that given by their magnitudes. We have subsequently spent some time further exploring its properties and it is upon our findings that I shall now report.

Main memory: the crucial component that vendors don’t mention

CPU performance hogs the limelight when people discuss the year-on-year increases in computing power that used to occur.

This focus on cpu performance was/is driven by marketing, the people with the money either don’t want customers thinking about the performance impact of main memory size or speed, or want them to treat the processor as the most important component of a computer. Vendors want processor performance to drive customer purchase decisions.

Hardware manufacturers used to entice new customers with low cost machines, containing minimal memory. Once a customer started to use their shiny new computer, they found that it did save them lots of time and money, but also they needed more memory (which could only be brought from the manufacturer and was not cheap).

The plot below shows the prices IBM charged for System 360s, in 1966. Anti-trust investigations uncover all kinds of interesting data, like selling low-spec equipment at a loss to entice customers and make life difficult for competitors (code+data for all plots).

Profit margin on IBM 360s sold with various memory sizes

The plot below (data from the 19 Aug 1985 issue of ComputerWorld) shows how the price of computers increased as the minimum about of memory they supported increased.

Yes, in 1985 top end computers came with over 50M of memory; but most customers thought themselves lucky if they had a few megabytes.

If the processor is slow, it just takes longer for programs to run. If the computer does not have enough memory, programs cannot run. For most applications memory requirements are addressed first, followed by processor performance; memory requirements is the number one issue. The optimizations that commercial compilers could perform were limited by the memory capacity of developer machines.

List price of computers, in 1985, supporting the given minimum amount of  memory

Intel’s main line of business used to be selling memory chips, but these chips became commodity items as more companies entered the market; Intel bet the farm on selling processors and the rest is history. As a seller of a unique product it was/is in Intel’s interest to spend lots of money on marketing the benefits of processor performance; sellers of commodity items (such as memory chips) don’t have nearly as much to gain from generic product marketing, because customers may choose to buy from other sellers (in such markets sellers have to concentrate on marketing themselves).

Memory capacity/speed and cpu speed are two aspects of system performance; they need to be balanced to meet customer drive application requirements. The plot below shows the SPEC cpu integer performance of 4,332 systems running at various clock rates; the colors denote the different peak memory transfer rates of the memory chips in these systems (code+data).

SPEC cpu integer performance vs. cpu clock rate

These days (and perhaps in the past, I don’t have any data), memory performance is a much better predictor of system performance, but vendors don’t have an incentive to market this fact.

Introduction to Laravel Workshop



When: Wednesday, 20 June 20 - 9:00am to 4:45pm

Where: Kings Centre, King Street, Norwich, NR1 1PH

How much: £85

RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/250343445/

If you would like to start using the worlds most popular PHP framework, this is the workshop for you.

We will cover from installing Laravel to building a basic application. Everyone will learn along as we go from a freshly installed Laravel application up to creating a basic web app that interacts with a database.

Course Details

We will aim to cover the following (subject to change and class experience)

  • Routing
  • Using Blade (templating)
  • Controllers
  • Validation
  • Config/Env
  • Using Eloquent (database ORM)
  • Laravel inbuilt Auth
  • Artisan (CLI commands)
  • File Storage

Instructor

Simon Bennett is a Software Consultant who works with Laravel daily, helping clients update there development practices and coding software for startups. He also runs his own SAAS for backing up DigitalOcean servers which of course is all built with basic Laravel.

Prerequisites

This beginners workshop will run through simple code as we learn about Laravel.

Basic knowledge of PHP is required, I would recommend you know you know basic OOP.

To save time on the day, you will be contacted directly before the workshop with a guide on setting up docker. The reason we use docker is to make sure we are using identical environments on Windows/Mac/Linux.

Since Laravel used to build web-based applications (mostly), it would be helpful to a basic understanding of developing websites in PHP and MySQL.

  • Bring your Laptop
  • Have Docker installed
  • A good IDE like PHPStorm is strongly recommended
  • Working command terminal
  • A GIT client installed

TechNorwich: The End. A Story & New beginnings


TechNorwich: The End. A Story & New beginnings

Whitespace is sadly now having to close. Started in 2013, it was the first and largest co-working space in Norwich. Its been home to many great & successful businesses, plus home to Barclays Eagle Labs and birthplace of TechVelocity, the first accelerator for the region. Come along to this event to have a drink to remember some of the history, catch up with some of the great businesses that have lived here and discuss on the future of the amazing digital creative & tech sector in our fine city.

Get your ticket: https://technorwich.eventbrite.co.uk

Function Template Partial Ordering: Worked Examples

C++ function overloading rules are complex. C++ template rules are complex. Put the two together, and you unfortunately do not get something simple; you get a hideous monster of standardese which requires great patience and knowledge to overcome. However, since C++ is mostly corner-cases, it can pay to understand how the rules apply for those times where you just can’t work out why your code won’t compile. This post will present a few step-by-step examples of how partial ordering of function templates works in order to arm you for these times of need.

Partial ordering of function templates is a step of overload resolution. It occurs when you call a function template which is overloaded and the compiler needs to decide which one is more specialized than the other. Consider this code:

template<class T> void f(T);        //(1)
template<class T> void f(T const*); //(2)

int const* p = nullptr;
f(p);

We expect f(p) to call (2), because p is a int const*. In order to decide that (2) is more specialized than (1), the compiler needs to follow the function template partial ordering rules. Let’s see what the standard has to say.

Partial ordering selects which of two function templates is more specialized than the other by transforming each template in turn (see next paragraph) and performing template argument deduction using the function type.

This is not so complicated, even if the terms may be unfamiliar. Unfortunately, the “next paragraph” and the sections which it references are almost impossible to parse without a lot of background knowledge and re-readings, so I shall step through the algorithm rather than pasting the rules for you to cry over.

There are four steps which we to have work through:

  1. Transforming (1).
  2. Performing deduction on (2) with the transformed template from step 1.
  3. Transforming (2).
  4. Performing deduction on (1) with the transformed template from step 3.

If one and only one of the deductions succeeds, then the template with which the deduction was performed is more specialized than the other.

Step 1: The rules state that for each template parameter, we create some unique type to use in its stead1. Let’s call this unique type type_0. You can pretend that this was defined somewhere like class type_0{};. Now we take our function template template <class T> void f(T) and substitute in type_0 for T. This gives us void f(type_0). The transformation is complete.

Step 2: Now that we have transformed template <class T> void f(T) into void f(type_0), we will perform deduction on (2) using the transformed function type. To do this, we imagine a call to (2) where the arguments have the type of the parameters for (1). Concretely, it would look like this:

template <class T> void func_2(T const*);
func_2(type_0{}); //derived from void f(type_0)

Would this call succeed? We can put it into our compiler to find out. GCC 8.1 says:

<source>: In function 'int main()':
<source>:4:18: error: no matching function for call to 'func_2(type_0)'
   func_2(type_0{});
                  ^
<source>:1:25: note: candidate: 'template<class T> void func_2(const T*)'
 template <class T> void func_2(T const*);
                         ^~~~~~
<source>:1:25: note:   template argument deduction/substitution failed:
<source>:4:18: note:   mismatched types 'const T*' and 'type_0'
   func_2(type_0{});
                  ^   

So deduction from (1) to (2) fails, because the invented type type_0 cannot be used to deduce const T*.

Step 3: Let’s try from (2) to (1). Again, we’ll transform (2) from template <class T> void f(T const*) to void f(type_0 const*).

Step 4: Now we attempt deduction:

template <class T> void func_1(T);
type_0 const* arg = nullptr;
func_1(arg);

This succeeds because a type_0 const* can be used to deduce T. Since deduction from (1) to (2) fails, but deduction from (2) to (1) succeeds, (2) is more specialized than (1) and will be chosen by overload resolution.


Let’s try a different example. How about:

template<class T> void g(T);  //(1)
template<class T> void g(T&); //(2)
int i = 0;
g(i);

(1) transforms to void g(type_0). Before we try deduction, we need to apply one of the numerous additional rules from the standard, which says we need to replace references with the type being referred to. So template <class T> void g(T&) becomes template <class T> void g(T). Deduction time:

template<class T> void func_2(T);
func_2(type_0{});

This succeeds.

Now the other direction. template<class T> void g(T&) transforms to void g(type_0&), then we remove the reference to get void g(type_0). Our second deduction:

template<class T> void func_1(T);
func_1(type_0{});

This is effectively identical to the previous one, so of course it succeeds.

Since deduction succeeded in both directions, the call is ambiguous. Sure enough, GCC diagnoses:

<source>: In function 'int main()':
<source>:5:8: error: call of overloaded 'g(int&)' is ambiguous
     g(i);
        ^
<source>:1:24: note: candidate: 'void g(T) [with T = int]'
 template<class T> void g(T);  //(1)
                        ^
<source>:2:24: note: candidate: 'void g(T&) [with T = int]'
 template<class T> void g(T&); //(2)
                        ^ 

This is why the algorithm is a partial ordering: sometimes two function templates are not ordered.


I’ll give one more example. This one has multiple parameters and is a bit more subtle.

template<class T>struct identity { using type = T; };
template<class T>struct A{};

template<class T, class U> void h(typename identity<T>::type, U); //(1)
template<class T, class U> void h(T, A<U>);                       //(2)
h<int>(0,A<void>{});

identity here just evaluates to its template argument, but the important thing to note is that typename identity<T>::type is a non-deduced context, so T cannot be deduced from the argument for that parameter.

(1) transforms to void h(typename identity<type_0>::type, type_0), which is void h(type_0, type_0). Attempt deduction on (2):

template<class T, class U> void func_2(T, A<U>);
func_2(type_0{}, type_0{});

This fails because we can’t match type_0 against A<U>.

(2) transforms to void h(type_0, A<type_1>). Try deduction against (1):

template<class T, class U> void func_1(typename identity<T>::type, U);
func_1(type_0{}, A<type_1>{});

This fails because typename identity<T>::type is a non-deduced context, so we can’t deduce T.

In the example from the last section deduction succeeded both ways so the call was ambiguous. In this example, deduction fails both ways, which is also an ambiguous call.


That’s the last of the examples. Of course, there are a bunch of rules which I didn’t cover here, like Concepts, parameter packs, non-type/template template parameters, and cases where both the argument and parameter types are references. Hopefully you now have enough of an intuition that you can understand what the standard says when you inevitably hit those corner cases. If you have any partial ordering conundrums, drop them down in the comments below or send them to me on Twitter.


  1. I’ll ignore non-type template parameters and template template parameters for simplicity, but the rules are essentially the same. 

Because your “competitors have it” IS NOT STRATEGY

iStock-514378725medium-2018-06-5-17-04.jpg

“We need a product that does X because our competitors have a product that does X”
“Our product needs feature Y because our competitors product has feature Y.”

It makes me want to cry.

Let me clear: building something because your competitors have it IS NOT A STRATEGY.

Neither is it a particularly good tactic.

Stop obsessing about your competitors and think about your customers.

I don’t doubt that your people are being told that customers are buying the competitor product because it has X or Y and I don’t doubt that some of your people feel that if you only matched the competitors feature for feature you would win but I just can’t see it myself.

For a start, is feature Y really the only thing loosing the sale? Are the products so well balanced that this one small thing is it? And is there really nothing that your product does better?

Try this simple experiment: tell the customer that feature Y will be delivered next month and see if they decide to buy yours there and then or find something else that makes the competition better.

Now lets suppose you decide to build Y. Before you make any plans ask yourself:

While you are building feature Y what are your competitors going to be doing?
Will they stand still or will they be adding feature Z?
And once they have feature Z will you need to play catch up?

Chances are that tomorrow you get to where you want to be (where your competitors are today) only to find your competitors have something else you don’t have either.

I’ll agree this is a good strategy if you have deliberately chosen to be a Fast Follower – you can play Android to your competitors iOS. Just make sure you know why your customers will choose your Android over the competitor iOS.

Will you be cheaper?
Or better?
Or will you bundle some other goodies with it?

Before you run to where your competitors are today ask yourself: where will your competitors be tomorrow?

If you still insist on building this feature you need to

  • Make sure you do a much better job (easier to use, more intuitive, faster to produce results, better quality results, or some such)
  • OR you need to do it fast and cheap so you can spend your precious resources on building something the competitor doesn’t have
  • OR you being overwhelming resources to the table so you are going to stand a chance. Every day you delay the competitor gets further ahead, so don’t try half measures

A better approach is to find out what your customers actually need. Stop looking at the features, go back to first principles: what is the problem your customers face? what is the job they are attempting to make progress with?

How can you help your customers with this job?
How can you make them faster?
How can you help them achieve their work more cheaply? Or at better quality? – in fact, what do “better” and “quality” look like to them.

Someone – I honestly forget who – told me earlier this year that they wanted to catch-up with their competitor and overtake them.

One small flaw there: if you build features to match your competitors you can never overtake them because you won’t know what to build once you reach parity.

Put it another way, you add all the features they have today, and all the features they add while you are catching up. What do you build next? Until they build their next version (and recapture the lead) you don’t know what to build. And if you build something different you just lost feature parity.

So, go back and examine what your customers are using your tool for. Look at the job to be done, look at how your customers are doing their job and using your tool and work out for yourself how you can help customers do a better job.

Celebrate the difference, explain why you are better.

And please forget about matching the competition.

I’m old enough to remember the days when WordStar was fighting WordPerfect, AmiPro was fighting them both, and all were better than Microsoft Word. Adverts and magazine reviews would compare them feature to feature. Someone somewhere thought people bought word processors based on the number of features.

Then Microsoft launched Windows and everybody went over to Microsoft Word for Windows almost overnight.

Don’t focus on your competitors. Focus on your customers. Unfortunately that requires more work and some original thinking.

The post Because your “competitors have it” IS NOT STRATEGY appeared first on Allan Kelly Associates.

Shooting with Flash (And Motofest 2018)

Talking about shooting with an on-camera flash at Coventry Motofest 2018.

I’ve never really used my flashgun before, mostly because on the odd occasion that I have used it at a shoot I ended up with under, or over exposed images and thus never wanted to risk wasting time and shots just to practice.

Last Saturday I decided to seriously try out my flash at an event where my photos were purely for myself, rather than with the intent to sell them.
I was shocked to find my results to be fantastic!

“The prettiest girl is riding in the ‘Stang!”

The day was probably quite appropriate for shooting with an on camera flash.
It was a little overcast and although bright-ish, things just were not popping very much.
So adding some light from a flash worked well.

The subjects seemed to work for flash too, shiny cars!

A row of MGs

I found that to get the exposure right I had to dial up the shutter speed or aperture so my camera was showing a couple 0.1 stops over exposure, otherwise the image would look too dark. Despite using i-TTL mode.
I wonder if this is because of using a diffuser and angling the flash at about 60 degrees, some of the light would be directed upwards and not hit the subject.

Using the flash without a diffuser was terrible, all images just had hugely over-exposed sections where the flash hit.
The regular hard-plastic diffuser that came with my flash is ‘fine’ — but the new one is certainly better. It results in a softer light that is much more directed towards the subject than the plastic one.
Instead, I used a diffuser I’d got from Amazon a few days before.

Flash comparison — Both same shutter, aperture and ISO. The left is lit with the new diffuser, the right with the hard plastic one.

From the above comparison it is clear to see that the new diffuser provides more light more widely spread across the image.

Example of a portrait photo where the flash is being bounced towards the subject well by rotating the diffuser.

The added benefit of this item is that it has a reflector in the back on the inside — allowing me to easily shoot portrait with on-camera flash and just rotate the diffuser so that light is still being bounced toward the subject from the front. Rather than the light ended up hitting the left or right side of the subject.

Below are a bunch of my favourite images from the event. But I got about 100 good ones.

It was a lot of fun being able to try out new equipment without the pressure of having to produce good images.

I now feel confident to get good photos with an on-camera flash.

Shooting different subject for once was also fun! When shooting static subjects one has a lot more time to choose the composition of an image. Although I note now that most of my images see the subject square in the centre of the frame.

60% of the time, works every time
Whatever level of zip ties you’re on, you’re not on this level of zip ties.
Lego block engine cover in a Nissan Cube
An Aston Martin V8 Vantage in Gulf livery
More Gulf livery, this time on a VW Golf.
Jacuar
I also got to try out my 100mm macro lens — although it is certainly too long a focal length for shooting cars.
Ford V8 with Holley carburettor.
I call this one “I like to chop up pedestrians” — also thats the smallest number plate I’ve ever seen.
More lights = Better
Skele says Hi.

Free books and other news

3Books-2018-06-4-10-31.jpg

Many of you are reading this because you signed-up for a free copy of my Xanpan book.

Thank you so much! – I hope you are enjoying my thoughts, reflections and tips.

Now, can I ask a favour, please? – a few minutes of your time.

If you have a copy of Xanpan would you mind writing me an Amazon review? (thats .com, Amazon UK has a separate list of reviews – yes, it is a pain).

Please, please, please 🙂

Amazon reviews make a big difference to sales and I’d be most grateful. (Even more so for 5-star reviews!)

And I will happily give a free review copy of “Little Book of Requirements and User Stories” to anyone who would like to review that book too – mail me, allan@allankelly.net. (And if you already have a copy of Little Book please suggest some other way I can thank you for your review.)

I’m also working on getting Continuous Digital and Project Myopia onto Amazon. Both will get new professional covers and a proper copy edit

Finally, as some of you know, I’ve started writing a companion to Little Book: Product Ownership. Again I’m using the LeadPub system so you can buy the book now and get free updates as I add to the book and edit it. And I am most grateful to those of you who have already bought Product Ownership.

The post Free books and other news appeared first on Allan Kelly Associates.

EDG and Github are both logical purchases for Microsoft

It looks like my prediction that Microsoft buys Github may be about to come true.

Microsoft has been sluggish in integrating their LinkedIn purchase into their identity management system. Lots of sites have verify identity using Github options (or at least the kind of sites I visit do), so perhaps LinkedIn identity will be trialed via Github.

A Github purchase will also allow Microsoft to directly connect lots of developers to Azure. Being able to easily build and execute Github code on Azure is the bait, customer data is where the money is; making Github more data friendly is an obvious first priority for new owners.

Who else should Microsoft buy? As a protective move, I think they should snap up Edison Design Group (EDG) before somebody else does. Readers outside of the compiler/static analysis/C++ standards world are unlikely to have heard of EDG. They sell C/C++ front ends (plus other languages) that support all the historical features/warts supported by other C/C++ compilers. The features only found in Microsoft’s compilers is what make it very costly/time-consuming for many companies to port their applications to other platforms; developer use of Microsoft compiler dependent features is a moat that makes it difficult for many companies to leave the Microsoft ecosystem. EDG have been in the business a long time and have built up an extensive knowledge of vendor specific compiler features; the kind of knowledge that can only be obtained by having customers tell you what language constructs they are using that your current product does not handle (and what those constructs actually mean).

What would happen if a very large company bought EDG, and open sourced its code (to make it easier for Windows developers to switch platforms, not to make any money off compiler related tools)? Somebody would have to bolt on a back-end, to generate code; but that would not be hard (EDG have designed their product to make this easy). A freely available compiler, supporting all/most of the foibles of the Microsoft C++ compiler, would tempt many Windows only developers to give it a go. A free compiler removes management from the loop, developers can try things out as a side project, without having to get management approval to spend money on a compiler (from practical experience I know how hard it is to sell compatible compiler products, i.e., there is no real money to be made by anybody doing this commercially).

Is this risk, to Microsoft, really worth the (relatively) low cost of buying EDG? The EDG guys are not getting any younger, why wouldn’t they be willing sell?

Chalk The Lines – a.k.

Given a set of points (xi,yi), a common problem in numerical analysis is trying to estimate values of y for values of x that aren't in the set. The simplest scheme is linear interpolation, which connects points with consecutive values of x with straight lines and then uses them to calculate values of y for values of x that lie between those of their endpoints.
On the face of it implementing this would seem to be a pretty trivial business, but doing so both accurately and efficiently is a surprisingly tricky affair, as we shall see in this post.

Gitlab certificates

On Ubuntu, cloning a repo from a machine you don't have a certificate for will give the error:

fatal: unable to access 'https://servername': server certificate verification failed. CAFuile /etc/ssl/certs/your_filename CRLfile: None

You can work around this by tell git clone not to use the certificate e.g.

git config --system http.sslverify false


which is asking for trouble. However you can install the certificate, so you don't need to keep doing this. 
Using an answer here: https://stackoverflow.com/questions/21181231/server-certificate-verification-failed-cafile-etc-ssl-certs-ca-certificates-c  looks to have worked, by trying things one step at a time:
hostname=gitlab.city.ac.uk
port=443
trust_cert_file_location=`curl-config --ca`
sudo bash -c "echo -n | openssl s_client -showcerts -connect $hostname:$port \
    2>/dev/null  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'  \
    >> $trust_cert_file_location"
I did try this first – so errors don’t end up in dev null:

openssl s_client -showcerts -connect $hostname:$port


Also, I first got the error sed: unrecognised option '--ca'
It took a moment to realise the --ca came from curl-config, which I needed to install.

The age of the Algorithm is long gone

I date the age of the Algorithm from roughly the 1960s to the late 1980s.

During the age of the Algorithms, developers spent a lot of time figuring out the best algorithm to use and writing code to implement algorithms.

Knuth’s The Art of Computer Programming (TAOCP) was the book that everybody consulted to find an algorithm to solve their current problem (wafer thin paper, containing tiny handwritten corrections and updates, was glued into the library copies of TAOCP held by my undergraduate university; updates to Knuth was news).

Two developments caused the decline of the age of the Algorithm (and the rise of the age of the Ecosystem and the age of the Platform; topics for future posts).

  • The rise of Open Source (it was not called this for a while), meant it became less and less necessary to spend lots of time dealing with algorithms; an implementation of something that was good enough, was available. TAOCP is something that developers suggest other people read, while they search for a package that does something close enough to what they want.
  • Software systems kept getting larger, driving down the percentage of time developers spent working on algorithms (the bulk of the code in commercially viable systems deals with error handling and the user interface). Algorithms are still essential (like the bolts holding a bridge together), but don’t take up a lot of developer time.

Algorithms are still being invented and some developers spend most of their working with algorithms, but peak Algorithm is long gone.

Perhaps academic researchers in software engineering would do more relevant work if they did not spend so much time studying algorithms. But, as several researchers have told me, algorithms is what people in their own and other departments think computing related research is all about. They remain shackled to the past.

Emacs 26.1 has been released (and it’s already on Homebrew)

Saw the announcement on on the GNU Emacs mailing list this morning. Much to my surprise, it’s also already available on homebrew. So my Mac is now sporting a new fetching version of Emacs as well :). I’ve been running the release candidate on several Linux machines already and was very happy with it, so […]

The post Emacs 26.1 has been released (and it’s already on Homebrew) appeared first on The Lone C++ Coder's Blog.

DSLinux on a DSLite with an M3DS Real card and SuperCard SD

DSLinux running on a Nintendo DSLite

I recently bought a gorgeous pink Nintendo DSLite with the sole purpose of running DSLinux on it.
When I posted about my success on Mastodon , someone helpfully asked “Has it have any use tho?”.
Lets answer that right away: Running Linux on a Nintendo DSLite is at best a few hours entertainment for the masochistic technologist, and at worst a waste of your time.

Running Linux on a Nintendo DSLite is at best a few hours entertainment for the masochistic technologist, and at worst a waste of your time.

But, I do rather enjoy running Linux on things that should not be running Linux, or at least attempting to do so. So heres what I did!

Hardware:

  • Nintendo DSLite
  • SuperCard SD (Slot 2)
  • M3DS Real (Slot 1)
  • R4 Card (Knockoff, says R4 SDHC Revolution for DS on the card)

DSLinux runs on a bunch of devices, luckily we had some R4 cards and an M3DS Real around the place which are both supported by DSLinux.
I purchased a SuperCard SD from Ebay to provide some extra RAM, which apparently is quite useful, since the DSLite has only 2mB of it on it’s own.The SuperCard SD I bought had 32mB extra RAM bringing the total up to some 34mB, wowee.

R4 Cards

The first cards I tried were the R4 cards we had.
They’re popular and supported by DSLinux. Unfortunately, it seems the ones we’ve got are knockoffs and therefore proved challenging to find firmware for.
I spent a long while searching around the internet and trying various firmwares for R4 cards — None of them I tried did anything except show the Menu? screen on boot.

Finally, finding this post on GBATemp.net from a user with a card that looks exactly the same as mine lead me to give up on the R4 card and move on to the M3DS Real. Although the post did prove useful later.

It should be noted that the R4 card I had had never been tested anyway, so it might never have worked.

M3DS Real

Another card listed as supported on the DSLinux site, so seemed a good one to try.
We had a Micro-SD Card in the M3Real anyway, with the M3 Sakura firmware on it so it seemed reasonable to just jump in there.

I copied the firmware onto another SD Card (because we didn’t want to loose the data on the original card). It was only 3 folders, SYSTEM, NDS and SKINS in the root of the card. The NDS file containing ‘games’.

In this case, I put the DSLinux files (dslinux.nds, dslinuxm.nds and ‘linux’, a folder) into the NDS folder and stuck it in my DSLITE.

After selecting DSLinux from the menu, I got the joy of….a blank screen.

Starting DSLinux from M3 Sakura results in a white screen

Some forum posts which are the first results when searching the issue on DuckDuckGo suggest that something called DLDI is the issue.

The DSLinux ‘Running DSLinux’ does mention patching the ‘dslinux.nds’ file with DLDI if the device one is using doesnt support auto-dldi. At the time this was all meaningless jargon to me, since I’ve never done any Nintendo DS homebrew before.

Turns out, DLDI is a library that allows programs to “read and write files on the memory card inserted into one of the system’s slots”.
Homebrew games must be ‘patched’ for whatever device you’re using to allow them to read/write to the storage device.
Most of the links on the DSLinux page to DLDI were broken, but we descovered the new home of DLDI and it’s associated tools to be www.chishm.com/DLDI/ .

I patched the dslinux.nds file using the linux command line tool and saw no change to the behaviour of the DSLite, still white screens.

Upon reading the DSLinux wiki page for devices a little closer, I noticed that the listing for the M3DS Real notes that one should ‘Use loader V2.7d or V2.8’.

What is a loader??
It means the card’s firmware/menu.

Where do I find it?
On the manufacturer’s website, or, bringing back the post mentioned earlier with the R4 card user on GBATemp.net, one can find lots of firmware’s for lots of different cards here: http://www.linfoxdomain.com/nintendo/ds/

Under the listing on the above site for ‘M3/G6 DS Real and M3i Zero’ one can find a link to firmware versions V2.7d and V2.8 listed as ‘M3G6_DS_Real_v2.8_E15_EuropeUSAMulti.zip’.

Upon installing this firmware to the SD Card (by copying ‘SYSTEM’ folder to the root of a FAT32 formatted card, I extracted the DSLinux files again (thus, without the DLDI patching I’d done earlier) and placed the files ‘dslinux.nds, ‘dslinuxm.nds’ and the folder ‘linux’ to an ‘NDS’ folder, also in the root of the drive.
This is INCORRECT.
Upon loading the dslinux.nds file through the M3DS Real menu it did indeed boot Linux, but dropped me into a single-user mode, with essentially no binaries in the PATH.
This is conducive to the Linux kernel having booted successfully, but not being able to find any userland. Hence the single-user and lack of programs.

Progress at least!

I re-read the DSLinux instructions and caught the clear mention to ‘Both of these must be extracted to the root directory of the CF or SD card.’ when talking about the DSLinux files.

Upon moving the DSLinux files to the root of the directory and starting ‘dslinux.nds’ from the M3DS Real menu I had a working Linux system!!

I type `uname -a` on DSLinux. Its running Kernel 2.6.

Notice the ‘DLDI compatible’ that pops up when starting DSLinux — That means that the M3DS Real auto-patches binaries when it runs them. Nice.

What Next?

Probably trying to compile a newer kernel and userspace to start with.
Kernel 2.6, at time of writing, is 2 major versions out of date.

After that, I’d like to understand how DSLinux is handling the multiple screens and multiple processors.
The DS has an ARM7 and an ARM9 processor and two screens, which I think are not connected to the same processor, the buttons are split between the chips too.

Lastly, I’d like to write something for linux on the DS.
Probably something silly, but I’d like to give it a try!

Don’t ask me questions about DSLinux, I don’t really know anything more than what I’ve mentioned here. I just read some Wiki’s, solved some problems and did some searching.

Thanks to the developers of DSLinux and DLDI for making this silliness possible.

Test the Code, Not the Mock

About 18 months or so ago I wrote a post about how I’d seen tests written that were self-reinforcing (“Tautologies in Tests”). The premise was about the use of the same production code to verify the test outcome as that which was supposedly under test. As such any break in the production code would likely not get picked up because the test behaviour would naturally change too.

It’s also possible to see the opposite kind of effect where the test code really becomes the behaviour under test rather than the production code. The use of mocking within tests is a magnet for this kind of situation as a developer mistakenly believes they can save time [1] by writing a more fully featured mock [2] that can be reused across tests. This is a false economy.

Example - Database Querying

I recently saw an example of this in some database access code. The client code (under test) first configured a filter where it calculated an upper and lower bound based on timestamps, e.g.

// non-trivial time based calculations
var minTime = ...
var maxTime = ...

query.Filter[“MinTime”] = minTime;  
query.Filter[“MaxTime”] = maxTime;

The client code then executed the query and performed some additional processing on the results which were finally returned.

The test fixture created some test data in the form of a simple list with a couple of items, presumably with one that lies inside the filter and another that lies outside, e.g.

var orders = new[]
{
  new Order { ..., Timestamp = “2016-05-12 18:00:00” },
  new Order { ..., Timestamp = “2018-05-17 02:15:00” },
};

The mocked out database read method then implemented a proper filter to apply the various criteria to the list of test data, e.g.

{
  var result = orders;

  if (filter[“MinTime”])
    ...
  if (filter[“MaxTime”])
    ...
  if (filter[...])
    ...

  return result;
}

As you can imagine this starts out quite simple for the first test case but as the production code behaviour gets more complex, so does the mock and the test data. Adding new test data to cater for the new scenarios will likely break the existing tests as they all share a single set and therefore you will need to go back and understand them to ensure the test still exercises the behaviour it used to. Ultimately you’re starting to test whether can actually implement a mock that satisfies all the tests rather than write individual tests which independently validate the expected behaviours.

Shared test data (not just placeholder constants like AnyCustomerId) is rarely a good idea as it’s often not obvious which piece of data is relevant to which test. The moment you start adding comments to annotate the test data you have truly lost sight of the goal. Tests are not just about verifying behaviour either they are a form of documentation too.

Roll Back

If we reconsider the feature under test we can see that there are a few different behaviours that we want to explore:

  • Is the filter correctly formed?
  • Are the query results correctly post-processed?

Luckily the external dependency (i.e. the mock) provides us with a seam which allows us to directly verify the filter configuration and also to control the results which are returned for post-processing. Consequently rather than having one test that tries to do everything, or a few tests that try and cover both aspect together we can separate them out, perhaps even into separate test fixtures based around the different themes, e.g.

public static class reading_orders 
{
  [TestFixture]
  public class filter_configuration    
  ...    
  [TestFixture]
  public class post_processing    
  ...
}

The first test fixture now focuses on the logic used to build the underlying query filter by asserting the filter state when presented to the database. It then returns, say, an empty result set as we wish to ignore what happens later (by invoking as little code as possible to avoid false positives).

The following example attempts to define what “yesterday” means in terms of filtering:

[Test]
public void filter_for_yesterday_is_midnight_to_midnight()
{
  DateTime? minTime = null;
  DateTime? maxTime = null;

  var mockDatabase = CreateMockDatabase((filter) =>
  {
    minTime = filter[“MinTime”];
    maxTime = filter[“MaxTime”];
  });
  var reader = new OrderReader(mockDatabase);
  var now = new DateTime(2001, 2, 3, 9, 32, 47);

  reader.FindYesterdaysOrders(now);

  Assert.That(minTime, Is.EqualTo(
                new DateTime(2001, 2, 2, 0, 0, 0)));
  Assert.That(maxTime, Is.EqualTo(
                new DateTime(2001, 2, 3, 0, 0, 0)));
}

As you can hopefully see the mock in this test is only configured to extract the filter state which we then verify later. The mock configuration is done inside the test to make it clear that the only point of interest is the the filter’s eventual state. We don’t even bother capturing the final output as it’s superfluous to this test.

If we had a number of tests to write which all did the same mock configuration we could extract it into a common [SetUp] method, but only if we’ve already grouped the tests into separate fixtures which all focus on exactly the same underlying behaviour. The Single Responsibility Principle applies to the design of tests as much as it does the production code.

One different approach here might be to use the filter object itself as a seam and sense the calls into that instead. Personally I’m very wary of getting too specific about how an outcome is achieved. Way back in 2011 I wrote “Mock To Test the Outcome, Not the Implementation” which showed where this rabbit hole can lead, i.e. to brittle tests that focus too much on the “how” and not enough on the “what”.

Mock Results

With the filtering side taken care of we’re now in a position to look at the post-processing of the results. Once again we only want code and data that is salient to our test and as long as the post-processing is largely independent of the filtering logic we can pass in any inputs we like and focus on the final output instead:

[Test]
public void upgrade_objects_to_latest_schema_version()
{
  var anyTime = DateTime.Now;
  var mockDatabase = CreateMockDatabase(() =>
  {
    return new[]
    {
      new Order { ..., Version = 1, ... },
      new Order { ..., Version = 2, ... },
    }
  });
  var reader = new OrderReader(mockDatabase);

  var orders = reader.FindYesterdaysOrders(anyTime);

  Assert.That(orders.Count, Is.EqualTo(2));
  Assert.That(orders.Count(o => o.Version == 3),
              Is.EqualTo(2));
}

Our (simplistic) post-processing example here ensures that all re-hydrated objects have been upgraded to the latest schema version. Our test data is specific to verifying that one outcome. If we expect other processing to occur we use different data more suitable to that scenario and only use it in that test. Of course in reality we’ll probably have a set of “builders” that we’ll use across tests to reduce the burden of creating and maintaining test data objects as the data models grow over time.

Refactoring

While reading this post you may have noticed that certain things have been suggested, such as splitting out the tests into separate fixtures. You may have also noticed that I discovered “independence” between the pre and post phases of the method around the dependency being mocked which allows us to simplify our test setup in some cases.

Your reaction to all this may well be to suggest refactoring the method by splitting it into two separate pieces which can then be tested independently. The current method then just becomes a simple composition of the two new pieces. Additionally you might have realised that the simplified test setup probably implies unnecessary coupling between the two pieces of code.

For me those kind of thoughts are the reason why I spend so much effort on trying to write good tests; it’s the essence of Test Driven Design.

 

[1] My ACCU 2017 talk “A Test of Strength” (shorter version) shows my own misguided attempts to optimise the writing of tests.

[2] There is a place for “heavier” mocks (which I still need to write up) but it’s not in unit tests.

Visual Lint 6.5.2.295 has been released

This is a recommended maintenance update for Visual Lint 6.5. The following changes are included:
  • Added basic support for Qt Creator projects (.pro/.pro.user files). Note that the implementation does not yet support subprojects or read preprocessor and include folder properties. As such, if the analysis tool you are using requires preprocessor or include folders to be defined (as PC-lint and PC-lint Plus do) for the time being they must be defined manually (e.g. written as -D and -i directives within a PC-lint/PC-lint Plus std.lnt indirect file).
  • The "Analysis Tool" Options page now recognises PC-lint Plus installations containing only a 64 bit executable if the "Use a 64 bit version of PC-lint if available" option is set.
  • When the PC-lint Plus installation folder is selected in the "Analysis Tool" Options page the PC-lint Plus manual (<installation folder>/doc/manual.pdf) is now correctly configured.
  • Added a workaround to the Eclipse plug-in for an issue identified with some Code Composer Studio installations which source plug-in startup and shutdown events in different threads.
  • Fixed a crash which affected some machines when the "Analysis Tool" Options page was activated when PC-lint was the active analysis tool.
  • Fixed a bug which caused the Visual Studio plug-in to be incorrrectly configured in Visual Studio 2017 v15.7.
  • Fixed a bug which could cause the PC-lint/PC-lint Plus environment file to reset to "Defined in std.lnt".
  • Updated the "Example PC-lint/PC-lint Plus project.lnt file" help topic and those relating to supported project types.
Download Visual Lint 6.5.2.295

Windows batch files

I've been writing a batch file to run some mathematical models over a set of inputs.
The models are software reliability growth models, described here.

We are using
  • du: Duane
  • go: Goel and Okumto
  • jm: Jelinski and Moranda
  • kl: Keiller and Littlewood
  • lm: Littlewood model
  • lnhpp: Littlewood non-homogeneous Poisson process
  • lv: Littlewood and Verrall
  • mo: Musa and Okumoto
Littlewood appears many times: he founded the group where I currently work. 

So, far too much background. I have one executable for each model, after making a make file; yet another story. And a folder of input files, named as f3[some dataset]du.dat, f3[some dataset]go.dat,... f3[some dataset]mo.dat. I also have some corresponding output files someone else produced a while ago, so in theory I can check I get the same numbers.I don't but that's going to be yet another story.

You can also use the original file and generated file to recalibrate, giving yet another file. Which I have previously generated results from. Which also don't match. 

I wanted to be able to run this on Ubuntu and Windows, and managed to make a bash script easily enough. Then I tried to make a Windows batch file to do the same thing. I'll just put my final result here, and point out the things I tripped up on several times.


ECHO OFF
setlocal EnableDelayedExpansion
setlocal 


for %%m in (du go jm kl lm lnhpp lv mo) do (
  echo %%m
  for %%f in (*_%%m.dat) do (
    echo %%~nf
    set var=%%~nf
    echo var: !var!
    set var=!var:~2!
    echo var now: !var!

    swrelpred\%%m.exe %%~nf.dat "f4!var!"
    swrelpred\%%mcal.exe %%~nf.dat "f4!var!" "f9!var!"
  )
)


1. First, turn the echo off because there's way too nosie otherwise.
2. Next, enable delayed expansion, otherwise things in blocks get expanded on sight and therefore don't change in the loop: "Delayed expansion causes variables delimited by exclamation marks (!) to be evaluated on execution"  from stack exchanges' Superuser site
3. Corollary: Use ! in the variables in the block not % for delayed expansion.
4.  But we're getting ahead of ourselves. The setlocal at the top means I don't set the variables back at my prompt. Without this, as I changed my script to fix mistakes it did something different between two runs, since a variable I had previously set might end up being empty when I broke stuff.
5. "Echo is off" spewed to the prompt means I was trying to echo empty variables, so the var: etc tells me which line something is coming from.
6. !var:~2! gives me everything from the second character, so I could drop the f3 at the start of the filename and make f4 and f9 files to try a diff on afterwards. Again pling for delayed expansion.




I suspect I could improve this, but it's six importnat things to remember another time.

Writing this in Python might have been easier. Or perhaps I should learn Powrshell one day.


Dialogue sheets update – translation & Amazon

SprintRetroA1V5medium-2018-05-24-10-52.jpg

It is six years now since I introduced Retrospective Dialogue sheets to the world and I continue to get great feedback about the sheets. Now I’m running a little MVP with the sheets via Amazon, but first…

In the last few months Alan Baldo has translated the planning sheet to Portuguese and Sun Yuan-Yuan, with help from David Tanzer, has translated two of the retrospective sheets to German.

Thank you very much Alan, Sun and David!

I also updated the Sprint Retrospective sheet (above): version 5 has removed all references to software development. While can still be used by software teams it is more general. Actually, the sheet was largely domain neutral already which explains why it has been used in a Swedish kindergarten for retrospectives.

In the meantime I’ve been busy with an MVP experiment of my own – which has taken a surprising amount of work to get up and running – and you can help with.

I have made printed versions of the latest Sprint Retrospective sheet available on Amazon to buy. The sheets are still available as a free download to print yourself but I want to see if I can reach a broader audience by offering the sheets on Amazon. Plus I know some teams have trouble getting the sheets printed.

Right now this is market test, the printed sheets are only available in the UK I only have a few sheets in stock so this is a “Buy now while stocks last” offer.

If you are outside the UK (sorry) and want a printed sheet, or find stocks have run out, or want a different printed sheets please contact me and I’ll do my best.

Assuming this is a success then I’ll get more sheets printed, arrange to sell outside the UK, add more of the sheets to Amazon and make a renewed effort on translations. Pheww!

So now I need to ask for your help.

If you have used the sheets and find them good please write a review on Amazon – there are a few but there cannot be too many.

Conversely, if you have never tried a Dialogue Sheet retrospective please do so and let me know how it goes: I am always seeking feedback. Download and print for yourself or go over to Amazon and buy today – you could be the first buyer!

The post Dialogue sheets update – translation & Amazon appeared first on Allan Kelly Associates.

Event: Find out what makes Python so appealing!

A Tour of Python
Burkhard Kloss

Wednesday, 6th June
The Priory Centre, Priory Plain, Great Yarmouth

Find out what makes Python so appealing!

Burkhard will offer a brief tour of the Python language, and some of the features that make it so expressive, easy to use, and appealing in a wide range of fields. After that, he'll look at examples of Python usage in practice, from really small computers (micro:bits) to clouds, from database to web development, and data science to machine learning.

Burhard Kloss

I only came to England to walk the Pennine Way… 25 years later I still haven’t done it. I did, though, get round to starting an AI company (spectacularly unsuccessful), joining another startup long before it was cool, learning C++, and spending a lot of time on trading floors building systems for complex derivatives. Sometimes hands on, sometimes managing people. Somewhere along the way I realised you can do cool stuff quickly in Python, and I’ve never lost my fascination with making machines smarter.

RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/249290648/

Premium mediocrity is software engineering’s demographic

Software engineering is one of the skills needed to write software, but outside of student coursework is rarely an end in itself. Software is written to do something and the person writing the code needs to know about the something.

If enough people are involved in something, a job title gets created by inserting the appropriate application domain name before ‘software engineer’, e.g., the something software engineer; systems software engineering was one of the first recorded uses of ‘software engineering’, ’embedded software engineer’ is a common usage and more recently ‘research software engineer’ has been trending.

Customers want the software systems they use to fulfill their needs. Implementing a software system involves figuring out what the needs are, how best to implement them using the available resources and producing usable software; all within a given amount of time and money.

How much software engineering knowledge and skill does a something software engineer need? The obvious answer is: enough to get the something done. Ok, how much is needed to get the something done?

There are so many hours in a day: what percentage of available time is best spent learning about software engineering, what percentage leaning about the something and what percentage doing rather than learning?

The only data I have for answering this question is my own experience of talking to people, from a wide range of business and application areas, whose job includes writing software. My background is compilers (from C to Cobol) and static analysis, my knowledge of end-user application domains is derived from talking to the developers who were using the compilers or static analysis tools I was working on at the time.

I have always been struck by the minimalist knowledge of most developers, when it comes to the programming language they were using. It took a while, but eventually I accepted the obvious: most developers don’t need to know much about the language they are using to get their job done.

By a process that resembles incentivized trial and error, people learn how to write code that does what they want; the compiler does not complain and the output looks ok. For some languages, I used to be able to work out which books a relatively new developer had used to guide their learning, by matching a book’s example code snippets with the code they had written.

This minimalist knowledge approach to programming languages is cost effective because most code is simple and has a short lifetime; the cost of learning lots of language details does not provide enough benefit to be worthwhile.

I am a minimalist language Python developer. Why would I spend time learning more about the semantics of Python than I need to?

What are the benefits of being a language expert? Compiler writers get paid to learn the ins and outs of a language and I know a few people who became language experts without being compiler writers (they got hooked on knowing the language). I have found it useful for keeping my code simple (I am not tempted to write complicate code, or use obscure constructs, in the mistaken belief that they are better than the simple stuff), it is also useful for figuring out other people’s complicated or obscure usage (created intentionally or accidentally).

These benefits are not enough to convince me to learn more about Python, the language. I am content to wait until I need to learn more.

I have occasionally taught advanced programming courses, aimed at developers with a few years experience working in industry. These courses had to include the word ‘advanced’ in their title, otherwise developers with a few years experience would never have signed-up; ‘advanced’ is a necessary marketing signal (others who have run such courses report the same behavior). The course contents were essentially a review of basic material, with lots of examples; most of those attending did not know enough to follow real advanced material. The courses were really about uncovering and correcting bad habits that attendees had picked up over time (often, a technique was discovered to fix a problem and then subsequently adopted for more general use).

What about general software engineering skills? A minimalist knowledge approach to software engineering is cost effective because most code does not exist long enough to make it worthwhile investing in reducing future maintenance costs. Yes, it is more expensive for those that survive to become commonly used, but think of all the savings from not investing in those that did not survive. Software engineering decisions should not be driven by surviorship bias.

The first requirement of any commercial software system is to attract paying customers. In a rapidly changing market, being first with a saleable product can be the difference between life and death. Minimizing software engineering effort saves time and money (in the short term). If the product is a success, there will be money to pay for what needs to be done, if the product fails nobody cares. I have seen a lot of software systems that are a commercial success and a complete software engineering mess; successful, well engineered software is less common (or perhaps they just don’t need me to help them out).

Software engineering mediocrity is not only viable, for most people it’s the outcome of making a cost/benefit decision to invest their learning time in the application domain, not software engineering (or computer language).

Of course, nobody wants to be seen as being mediocre (for some people, mediocre overstates their skill level); their behavior is premium mediocre.

There are a few application areas where software engineering skills are needed, e.g., safety critical software and warehouse scale computing. A few high profile cases are hiding the reality that whatever works is cost effective for most software solutions.

Blockade – baron m.

Good heavens Sir R----- you look quite pallid! Come take a seat and let me fetch you a measure of rum to restore your humors.
To further improve your sanguinity might I suggest a small wager?

Splendid fellow!

I have in mind a game invented to commemorate my successfully quashing the Caribbean zombie uprising some few several years ago. Now, as I'm sure you well know, zombies have ever been a persistent, if sporadic, scourge of those islands. On that occasion, however, there arose a formidable leader from amongst their number; the zombie Lord J------ the Insensate.

It Compiles, Ship It!

The method was pretty simple and a fairly bog standard affair, it just attempted to look something up in a map and return the associated result, e.g.

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    throw new Exception(“Customer not found”);

  return name;
}

The use of an exception here to signal failure implied to me that this really shouldn’t happen in practice unless the data structure is screwed up or some input validation was missed further upstream. Either way you know (from looking at the implementation) that the outcome of calling the method is either the value you’re after or an exception will be thrown.

So I was more than a little surprised when I saw the implementation of the method suddenly change to this:

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    return null;

  return name;
}

The method no longer threw an exception on failure it now returned a null string reference.

This wouldn’t be quite so surprising if all the call sites that used this method had also been fixed-up to account for this change in behaviour. In fact what initially piqued my interest wasn’t that this method had changed (although we’ll see in a moment that it could have been expressed better) but how the calling logic would have changed.

Wishful Thinking

I always approach a change from a position of uncertainty. I’m invariably wrong or have something to learn, either from a patterns perspective or a business logic one. Hence my initial assumption was that I now needed to think differently about what happens when I need to “lookup a name” and that lookup fails. Where before it was truly exceptional and should never occur in practice (perhaps indicating a bug somewhere else) it’s now more likely and something to be formally considered, and resolving the failure needs to be handled on a case-by-case basis.

Of course that wasn’t the case at all. The method had been changed to return a null reference because it was now an implementation detail of another new method which didn’t want to use catching an exception for flow control. Instead they now simply check for null and act accordingly.

As none of the original call sites had been changed to handle the new semantics a rich exception thrown early had now been traded for (at best) a NullReferenceException later or (worse case) no error at all and an incorrect result calculated based on bad input data [1].

The TryXxx Pattern

Coming back to reality it’s easy to see that what the author really wanted here was another method that allowed them to attempt a lookup on a name, knowing that in their scenario it could possibly fail but that’s okay because they have a back-up plan. In C# this is a very common pattern that looks like this:

public bool TryLookupName(string key, out string name)

Success or failure is indicated by the return value and the result of the lookup returned via the final argument. (Personally I’ve tended to favour using ref over out for the return value [2].)

The Optional Approach

While statically types languages are great at catching all sorts of type related errors at compile time they cannot catch problems when you smuggle optional reference-type values in languages like C# and Java by using a null reference. Any reference-type value in C# can inherently be null and therefore the compiler is at a loss to help you.

JetBrains’ ReSharper has some useful annotations which you can use to help their static analyser point out mistakes or elide unnecessary checks, but you have to add noisy attributes everywhere. However expressing your intent in code is the goal and it’s one valid and very useful approach.

Winding the clock into the future we have the new “optional reference” feature to look forward to in C# (currently in preview). Rather than bury their heads in the sand the C# designers have worked hard to try and right an old wrong and reduce the impact of Sir Tony Hoare’s billion dollar mistake by making null references type unsafe.

In the meantime, and for those of us working with older C# compilers, we still have the ability to invent our own generic Optional<> type that we can use instead. This is something I’ve been dragging into C# codebases for many years (whilst standing on my soapbox [3]) in an effort to tame at least one aspect of complexity. Using one of these would have changed the signature of the method in question to:

public Optional<string> LookupName(string key)

Now all the call sites would have failed to compile and the author would have been forced to address the effects of their change. (If there had been any tests you would have hoped they would have triggered the alarm too.)

Fix the Design, Not the Compiler

Either of these two approaches allows you to “lean on the compiler” and leverage the power of a statically typed language. This is a useful feature to have but only if it’s put to good use and you know where the limitations are in the language.

While I would like to think that people listen to the compiler I often don’t think they hear it [4]. Too often the compiler is treated as something to be placated, or negotiated with. For example if the Optional<string> approach had been taken the call sites would all have failed to compile. However this calling code:

var name = LookupName(key);

...could easily be “fixed” by simply doing this to silence the compiler:

var name = LookupName(key).Value;

For my own Optional<> type we’d just have switched from a possible NullReferenceException on lookup failure to an InvalidOperationException. Granted this is better as we have at least avoided the chance of the null reference silently making its way further down the path but it doesn’t feel like we’ve addressed the underlying problem (if indeed there has even been a change in the way we should treat lookup failures).

Embracing Change

While the Optional<> approach is perhaps more composable the TryXxx pattern is more invasive and that probably has value in itself. Changing the signature and breaking compilation is supposed to put a speed bump in your way so that you consider the effects of your potential actions. In this sense the more invasive the workaround the more you are challenged to solve the underlying tension with the design.

At least that’s the way I like to think about it but I’m afraid I’m probably just being naïve. The reality, I suspect, is that anyone who could make such a change as switching an exception for a null reference is more concerned with getting their change completed rather than stopping to ponder the wider effects of what any compiler might be trying to tell them.

 

[1] See Postel’s Law and  consider how well that worked out for HTML.

[2] See “Out vs Ref For TryXxx Style Methods”.

[3] C# already has a “Nullable” type for optional values so I find it odd that C# developers find the equivalent type for reference-type values so peculiar. Yes it’s not integrated into the language but I find it’s usually a disconnect at the conceptual level, not a syntactic one.

[4] A passing nod to the conversation between Woody Harrelson and Wesley Snipes discussing Jimi Hendrix in White Men Can’t Jump.

Estimation, planning, teams and money, some data

PlannedMay17-2018-05-17-11-46.jpg

When I deliver Agile training for teams I run an exercise called “The Extended XP Game”. It is based on the old “XP Game” but over the years I’ve enhanced it and added to it. We have a lot of fun, people are laughing and they still talk about it years later. The game illustrates a lot of agile concepts: iteration, business value, velocity, learning by doing, specification by example, quality is free, risk, the role of probability and some more.

When I run the exercise I divide the trainees into several teams, usually three or four people to a team. I show them I have some tasks written on cards which they will do in a two minute iteration. They do two minutes or work, review, retrospect then do another two minutes of work – and possibly repeat a third time.

The first thing is for teams to Get Ready: I hand out the tasks and ask them to estimate, in seconds how long it will take to do each task: fold a paper airplane that will fly, inflate a balloon, deflate a balloon, roll a single six on a dice, roll a double six on two dice, find a two in a pack of cards and find all the twos in the pack of cards. Strictly speaking, this estimate is a prospective estimate, “how long will it take to do this in future?”

Once they have estimated how long each task will take someone is appointed product owner and they have to plan the tasks to be done (with the team).

What I do not tell the teams is that I’m timing them at this stage. I let the teams take as long as they like to get ready: estimate and plan. But I time how long the estimation takes and how long the following planning takes.

Once all the teams are “ready” I ask the teams: “how long did that take?”

At this point I am asking for a retrospective estimate: how long did it take. The teams have perfect estimation conditions: they have just done it, no time has elapsed and no events have intervened.

Typically they answer are 5 or 6 minutes, maybe less, maybe more. Occasionally someone gets the right number and they are then frequently dismissed by their colleagues.

Although I’ve been running this exercise for nearly 10 years, and have been timing teams for about half that time I’ve only been recording the data the last couple of years. Still it comes from over 65 teams and is consistent.

The total time to get ready to do 2 minutes of work is close to 13 minutes – the fastest team took just 5.75 minutes but the slowest took a whopping 21.25.

The average time spent estimating the tasks is 7 minutes. The fastest team took 2.75 minutes and the slowest 14 minutes.

The average time planning once all tasks are estimated is just short of 6 minutes. One team took a further 13.5 minutes to plan after estimates while another took just 16 seconds. While I assume they had pretty much planned while estimating it is also interesting to note that that team contained several people who had done the exercise a few years before.

(For statistics nuts the mean and median are pretty close together and I don’t think the mode makes much sense in this scenario.)

So what conclusions can we draw from this data?

1) Teams take longer to estimate than do

Everyone taking part in the exercise has been told – several times – that they are preparing to do a 2 minute iteration. Yet on average teams spend 12.75 minutes preparing – estimating and planning – to do 2 minutes of work!

Or to put it another way: teams typically spend six times longer to plan work than to do work.

The slowest team ever took over 10 times longer to plan than to do.

In the years I’ve been running this exercise no team has ever done a complete dry run. They sometimes do little exercises and time themselves but even teams which do this spend a lot of time planning.

This has parallels in real life too: many participants tell me their organization spend a long time debating what should be done, planning and only belatedly executing. One company I met had a project that had been in planning for five years.

TeamSize-2018-05-17-11-46.jpg

2) Larger teams take longer to estimate than small teams

My second graph shows there is a clear correlation between team size and the time it takes to estimate and plan. I think this is no surprise, one would expect this. In fact, this is another piece of evidence supporting Diseconomies of Scale: the bigger the team the longer it will take to get ready.

This is one reason why some people prefer to have an “expert” make the estimate – it saves the time of other people. However this itself is a problem for several reasons.

Anyone who has read my notes on estimation research (and the later more notes on estimation research) may remember that research shows that those with expert knowledge or in a position of authority underestimate by more than those who do the work. So having an expert estimate isn’t a cure.

But, those same notes include research that shows that people are better at estimating time for other people than they are at estimating time for themselves, so maybe this isn’t all bad.

However, this approach just isn’t fair. Especially when someone is expected to work within an estimate. One might also argue that it is not en effective use of time because the first person – the estimator – has to understand the task in sufficient detail to estimate it but rather than reuse this learning the task is then given to someone else who has to learn it all over again.

PlanningDelta-2018-05-17-11-46.jpg

3) Post estimation planning is pretty constant

This graph shows the planning delta, that is: after the estimates are finished how long does it take teams to plan the work?

It turns out that the amount time it takes to estimate the task has little bearing on how long the subsequent planning takes. So whether you estimate fast or slow on average it will take six more minutes to plan the work.

Perhaps this isn’t that surprising.

(If I’ve told you about this data in person I might have said something different here. In preparing the data for this blog I found an error in my Excel graphs which I can only attribute to a bug in Excel’s scatter chart algorithm.)

4) Vierordt’s Law holds

People underestimate longer periods of time (typically anything over 10 minutes), and overestimate short period of time (typically things less than two minutes).

Not only do trainees consistently underestimate how long it has taken them to get ready – which is over 10 minutes – but teams which record how long it takes to actually do each task find that their estimates are much higher than the actual time it takes. Even when teams don’t time themselves observation shows that they do the work far faster than they thought they would.

TimeVMoney-2018-05-17-11-46.jpg

5) Less planning makes more money

One of my extensions to the original game is to introduce money: teams have to deliver value, measured in money. This graph shows teams which spend less time planning go on to make more money.

I can’t be as sure about this last finding as the earlier ones because I’ve not been recording this data for so long. To complicate matters a lot happens between the initial planning and the final money making, I introduce some money and teams get to plan for subsequent iterations.

Still, there are lessons here.

The first lesson is simply this: more planning does not lead to more money.

That is pretty significant in its own right but there is still the question: why do teams which spend less time planning make more money?

I have two possible explanations.

I normally play three rounds of the game. When time is tight I sometimes stop the game after two rounds. In general teams usually score more money in each successive round. Therefore, teams who spend longer in planning are less likely to get to the third round so their score comes from the second round. If they had time to play a third round they would probably score higher than in round two.

This has a parallel in real life: if extra planning time delays the date a product enters the market it is likely to make less money. Delivering something smaller sooner is worth more.

This perfectly demonstrates that doing creates more learning than planning: teams learn more (and therefore increase their score) from spending 2 minutes doing than spending an extra 2 minutes planning.

The second possible explanation is that the more planning a team does the more difficult they might find it to rethink and change the way they are working.

The $1,600 shown was recorded by a Dutch team this year but the record is held by a team in Australia who scored over $2,000: to break into these high scores teams need to reinterpret the rules of the game.

One of the points of the game is to learn by doing. I suspect that teams who spend longer in planning find it harder to break away from their original interpretation of the rules. How can you think outside the box when you’ve spent a lot of time thinking about the box?

In one training session in Brisbane last year the teams weren’t making the breakthrough to the big money. Although I’d dropped hints of how to do this nobody had made the connection so I said: “You know, a team in Perth once scored over $2,000.” That caused one of the players to rethink his approach and score $1,141.

I’ve since repeated the quote and discovered that simply telling people that such high scores are possible causes them to discover how to score higher.

* * *

I’m sure there is more I could read into all this data and I will carry on collecting the data. Although now I have two problems…

First, having shared this data I might find people coming on my agile software training who change their behaviour because they have read this far.

Second: I need more teams to do this to gather data! If you would like to do this exercise – either as part of a full agile training course or as a stand alone exercise – please call (+44 20 3286 4292) or mail me, contact@allankelly.net, my rates are quite reasonable!

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post Estimation, planning, teams and money, some data appeared first on Allan Kelly Associates.

Product Ownership book – a work in progress

PrdOwnership-2018-05-17-11-04.jpg

A quick update: most of my recent blogs about the product owner role together with some new material, is now available in book form from LeanPub – https://leanpub.com/productownership.

I’m surprised to find I’ve written over 60 pages so far! Still, this is very much a work in progress, there are a few more chapters to add to part 1: The Product Owner role.

But it is part 2 which I’m itching to start writing: the tools of the trade.

For those who don’t know, the beauty of LeanPub is that you can buy my unfinished book now and you will receive updates – to your iPad, Kindle, PC, whatever – as they are produced.

That means three things to me.

Firstly I can receive your feedback – what do you like? What did I get wrong? What else should be in there?

Second, money is feedback, the more of you who buy the book the more motivated I am to write it – I like seeing sales, it tells me people want this book. And if you don’t buy… well maybe I should pivot and abandon it.

Third, it gives me a little beer money.

The bad news is: you also get my dyslexic spelling and grammar.

The post Product Ownership book – a work in progress appeared first on Allan Kelly Associates.

std::accumulate vs. std::reduce

std::accumulate has been a part of the standard library since C++98. It provides a way to fold a binary operation (such as addition) over an iterator range, resulting in a single value. std::reduce was added in C++17 and looks remarkably similar. This post will explain the difference between the two and when to use one or the other.

Let’s start by looking at their interfaces, beginning with std::accumulate.

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

std::accumulate takes an iterator range and an initial value for the accumulation. You can optionally give it a binary operation to do the reduction, which will default to addition. It will call this operation on the initial value and the first element of the range, then on the result and the second element of the range, etc. Here are two equivalent calls:

auto sum1 = std::accumulate(begin(vec), end(vec), 0);
auto sum2 = std::accumulate(begin(vec), end(vec), 0, std::plus<>{});

std::reduce has a fair few more overloads to get your head round, but has a very similar interface once you understand them:

template<class InputIt>
typename std::iterator_traits<InputIt>::value_type reduce(
    InputIt first, InputIt last);

template<class ExecutionPolicy, class ForwardIt>
typename std::iterator_traits<ForwardIt>::value_type reduce(
    ExecutionPolicy&& policy,
    ForwardIt first, ForwardIt last);

template<class InputIt, class T>
T reduce(InputIt first, InputIt last, T init);

template<class ExecutionPolicy, class ForwardIt, class T>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init);

template<class InputIt, class T, class BinaryOp>
T reduce(InputIt first, InputIt last, T init, BinaryOp binary_op);

template<class ExecutionPolicy, class ForwardIt, class T, class BinaryOp>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init, BinaryOp binary_op);

The differences here are:

I’ll talk about these points in turn.

Execution policies

Execution policies are a C++17 feature which allows programmers to ask for algorithms to be parallelised. There are three execution policies in C++17:

  • std::execution::seq – do not parallelise
  • std::execution::par – parallelise
  • std::execution::par_unseq – parallelise and vectorise (requires that the operation can be interleaved, so no acquiring mutexes and such)

The idea behind execution policies is that you can change a serial algorithm to a parallel algorithm simply by passing an additional argument to the function:

auto sum1 = std::reduce(begin(vec), end(vec));                      //sequential
auto sum2 = std::reduce(std::execution::seq, begin(vec), end(vec)); //sequential
auto sum3 = std::reduce(std::execution::par, begin(vec), end(vec)); //parallel

Allowing parallelisation is the main reason for the addition of std::reduce. Let’s look at an example where we want to sum up all the elements in an array. With std::accumulate it looks like this:

accumulate plus

Note that each step of the computation relies on the previous computation, i.e. this algorithm will execute serially and we make no use of hardware parallel processing capabilities. If we use std::reduce with the std::execution::par policy then it could look like this:

reduce plus

This is a trivial amount of data for processing in parallel, but the benefit gained when the data size is scaled up should be clear: some of the operations can be executed independently of others, so they can be done in parallel.

A common question is: why do we need an entirely new algorithm for this? Why can’t we just overload std::accumulate? For an example of why we can’t do this, let’s use std::minus<> instead of std::plus<> as our reduction operation. With std::accumulate we get this:

accumulate minus

However, if we try to use std::reduce, we could get something like:

reduce minus

Uh oh. We just broke our code.

We got the wrong answer because of the mathematical properties of subtraction. You can’t arbitrarily reorder the operands, or compute the operations out of order when doing subtraction. This is formalised in the properties of commutativity and associativity.

A binary operation ∗ on a set S is associative if the following equation holds for all x, y, and z in S:

(x ∗ y) ∗ z = x ∗ (y ∗ z)

An operation is commutative if:

x ∗ y = y ∗ x

Associativity lets the algorithm compute reduction steps on arbitrary adjacent pairs of elements. Commutativity allows carrying out the operation on intermediate results in whatever order they are produced in, rather than having to preserve the original ordering. Some people (e.g. here and here) find that commutativity is too strong a requirement on std::reduce, because it denies use of common fold operations like string concatenation, which is associative but not commutative. I agree that it’s a shame we don’t have a step between std::accumulate and std::reduce which only requires associativity, but maybe in the future!

We can understand how associativity and commutativity affect our algorithm as much as we want, but there’s no way for the compiler to reliably check this. As such, we’re stuck with having std::reduce as a separate algorithm. In concepts speak, these are called axioms1: requirements imposed on semantics which cannot generally be statically verified.

Input vs. Forward Iterators

The forward iterator overloads allow the implementation to chunk up the data and dispatch these subranges to different threads. The idea is that input iterators are single-pass, whereas forward iterators can be iterated through multiple times. An algorithm couldn’t chunk up data indexed by input iterators because by the time it had gone through the range to work out the sub-range boundaries, the iterators will have been invalidated and can’t be passed on.

For more information on iterator types and parallel algorithms, see p0467r2.

Optional Initial Element

std::reduce lets you not bother passing an initial element, in which case it will default-construct one using typename std::iterator_traits<InputIt>::value_type{}. I think this was a mistake. A default-constructed value is not always the identity element (such as in multiplication), and it’s very easy to for a programmer to miss out the initial element. The code will still compile, it will just give the wrong answer. I suspect that this choice will result in some hard-to-find bugs when this interface comes into heavier use.

Finishing Up

That covers the differences between std::reduce and std::accumulate. My three point guide to std::reduce is:

  • Use std::reduce when you want your accumulation to run in parallel
  • Ensure that the operation you want to use is both associative and commutative
  • Remember that the default initial value is produced by default construction, and that this may not be correct for your operation

Now you know how and when to use std::reduce over std::accumulate. More generally, the differences show some of the technical aspects you need to consider when parallelising any kind of algorithm. Keep in mind how your operations act with respect to common mathematical properties and you might save yourself some debugging down the line.

Acknowledgements

Thanks to Christopher Di Bella for reviewing this post and linking me to p0467r2. Thanks to Ben Steffan, Ben Deane, and TemplateRex for discussion about commutativity.


  1. For more about axioms and algorithms, see this post by Christopher Di Bella

Replicating results using research software

The reproducibility of results, from scientific studies, has always been an important issue. Over the last few years software has become a hot topic in reproducibility circles; many researchers have an expectation that if they run the original researcher’s software, they will replicate the results. Reality has not lived up to their expectations and there is lots of flapping around looking for a solution. There is a solution, but first, why does the problem exist?

I have spent a lot of time porting software to different compilers (when I was in the compiler business, I wanted everybody to port their applications to the compiler I was working), different hardware (oh, the days when every major vendor had at least one distinct cpu; not like today where it’s x86, ARM, or embedded), different operating systems (umpteen flavors of Unix, all with slightly different header file contents and library behavior; the Unix wars were good for those in the porting business) and every now and again different languages (by translating).

The Wintel alliance wiped out variation in cpus and operating systems (they can still be found lurking in dark corners) and open source compilers created a near monoculture of compilers for the major languages.

The major software portability problems of 30 years ago have become rather minor. But software portability problems that once tended to be minor (at least for scientific software), have grown to become a major headache. Today’s major portability problems center around evolution of the libraries/packages being used, and longer term the evolution of the language(s) used.

Evolution has created development ecosystems where there are rampant dependencies on specific, or earlier than, or later than versions of libraries/packages. I have been out of the porting business for several decades, but talking to those doing it today, the story is the same; experience in porting from A to B is everything, second best is talking to somebody else who has gone in that direction and third best are the one-line forums such as stackoverflow.

Researchers are doing research on who-knows-what and probably have need-to-know knowledge of software and the libraries they are using, the researchers receiving a copy of the original software might know less. What is the probability that the originating and receiving researchers have exactly versions of libraries installed? The receiving researcher may not have any of the needed libraries installed, and promptly install the latest version (which may well be more recent than the one used by the original researcher).

A solution is available; distribute a duplicate of the researchers complete system as a container, e.g., a Docker image.

Containers solve the replication problem. But these days people want more, they actually think it should be possible to take research software and modify it to suite their own needs. Good luck with that.

Research software is written to solve a problem, often by people writing their first non-trivial programs (i.e., they are novices), with no incentive to produce something that is easy for others to use. When software is written by experienced developers, who have an incentive to build something that is easy for others to work with, multiple reimplementations are often still required to achieve something of decent quality. Creating robust software, that others can use, is very hard.

The problem with software is its invisibility; the difficulties are not visible. When the internal operations are visible, the difficulties of making changes are easier to see.

James Albert Bonsack's cigarette rolling machine

James Albert Bonsack’s cigarette rolling machine (from Wikipedia).

The Perils of DateTime.Parse()

The error message was somewhat flummoxing, largely because it was so generic, but also because the data all came from a database extract rather than manual input:

Input string was not in a correct format.

Naturally I looked carefully at all the various decimal and date values as I knew this was the kind of message you get when parsing those kind of values when they’re incorrectly formed, but none of them appeared to be at fault. The DateTime error message is actually slightly different [1] but I’d forgotten that at the time and so I eyeballed the dates as well as decimal values just in case.

Then I remembered that empty string values also caused this error, but lo-and-behold I was not missing any optional decimals or dates in my table either. Time to hit the debugger and see what was going on here [2].

The Plot Thickens

I changed the settings for the FormatException error type to break on throw, sent in my data to the service, and waited for it to trip. It didn’t take long before the debugger fired into life and I could see that the code was trying to parse a decimal value as a double but the string value was “0100/04/01”, i.e. the 1st April in the year 100. WTF!

I immediately went back to my table and checked my data again, aware that a date like this would have stood out a mile first time around, but I was happy to assume that I could have missed it. This time I used some regular expressions just to be sure my eyes were not deceiving me.

The thing was I knew what column the parser thought the value was in but I didn’t entirely trust that I hadn’t mucked up the file structure and added or removed an errant comma in the CSV input file. I didn’t appear to have done that and so the value that appeared to be causing this problem was the decimal number “100.04”, but how?

None of this made any sense and so I decided to debug the client code, right from reading in the CSV data file through to sending it across the wire to the service, to see what was happening. The service was invoked via a fairly simple WCF client assembly and as I stepped into that code I came across a method called NormaliseDate()...

The Mist Clears

What this method did was to attempt to parse the input string value as a date and if it was successful it would rewrite it in an unusual (to me) “universal” format – YYYY/MM/DD [3].

The first two parsing attempts it did were very specific, i.e. it used DateTime.ParseExact() to match the intended output format and the “sane” local time format of DD/MM/YYYY. So far, so good.

However the third and last attempt, for whatever reason, just used DateTime.Parse() in its no-frills form and that was happy to take a decimal number like “100.04” and treat it as a date in the format YYY.MM! At first I wondered if it was treating it as a serial or OLE date of some kind but I think it’s just more liberal in its choice of separators than the author of our method intended [4].

Naturally there are no unit tests for this code or any type of regression test suite that shows what kind of scenarios this method was intended to support. Due to lack of knowledge around deployment and use in the wild of the client library I was forced to pad the values in the input file with trailing zeroes in the short term to workaround the issue, yuck! [5]

JSON Parsers

This isn’t the first time I’ve had a run-in with a date parser. When I was working on REST APIs I always got frustrated by how permissive the JSON parser would be in attempting to coerce a string value into a date (and time). All we ever wanted was to keep it simple and only allow ISO-8601 format timestamps in UTC unless there was a genuine need to support other formats.

Every time I started writing the acceptance tests though for timestamp validation I’d find that I could never quite configure the JSON parser to reject everything but the desired format. In the earlier days of my time with ASP.Net even getting it to stop accepting local times was a struggle and even caused us a problem as we discovered a US/UK date format confusion error which the parser was hiding from us.

In the end we resorted to creating our own Iso8601DateTime type which used the .Net DateTimeOffest type under the covers but effectively allowed us to use our own custom JSON serializer methods to only support the exact format we wanted.

More recently JSON.Net has gotten better at letting you control the format and parsing of dates but it’s still not perfect and there are unit tests in past codebases that show variants that would unexpectedly pass, despite using the strictest settings. I wouldn’t be surprised if our Iso8601DateTime type was still in use as I can only assume everyone else is far less pedantic about the validation of datetimes and those that are have taken a similar route to ensure they control parsing.

A Dangerous Game

One should not lose sight though of the real issue here which the attempt to classify string values by attempting to parse them. Even if you limit yourself to a single locale you might get away with it but when you try and do that across arbitrary locales you’re just asking for trouble.

 

[1] “String was not recognized as a valid DateTime.

[2] This whole fiasco falls squarely in the territory I’ve covered before in my Overload article “Terse Exception Messages”. Fixing this went to the top of my backlog, especially after I discovered it was a problem for our users too.

[3] Why they didn’t just pick THE universal format of ISO-8601 is anyone’s guess.

[4] I still need to go back and read the documentation for this method because it clearly caters for scenarios I just don’t normally see in my normal locale or user base.

[5] That’s what happens with tactical solutions, no one ever quite gets around to documenting anything because they never think it’ll survive for very long...

Closing the Product Owner mini-series: they are all different!

StopStart-2018-05-9-09-45.jpg
With some final words I’d like to draw this mini-series on the Product Owner to a close and open a new chapter with a new book. I’ve written six blog posts in the last two months and I have drafts for more but there are other things I want to blog about.

I have drafts for more posts and ideas for even more. So its time to make this into another book: Product Ownership. This is on the LeanPub site now and you can buy it. So far it just contains a new prologue story but I’ll add these posts soon as the first chapters.

Ever since I wrote Little Book of User Stories I’ve thought there should be a companion volume: “Little Book of Product Ownership”. The intention is for the first part of the new book to discuss the product owner role – and whether it should even exist – and then quickly get into the tools of Product Ownership.

Now some closing words…

While I’ve suggest a lot of things that a Product Owner should do, and a few that they should not do, there are really no hard and fast rules about what a Product Owner should or should not do.

In the language of business schools: there is no contingent way of being a product owner, every product owner and organization is different and they need to find their own path. I cannot give you a flow chart for what a product owner does or should do, nor can I give you a set of rules to say “When the customer says Foo the Product Owner should do Bar.”

Every Product Owner has to work out what is right for them because every organization is different. And every organization will – rightly or wrongly – expect different things from the people it christens Product Owner.

Additionally every team is different and contains different skills and experience. As a result every team will differ in what it needs from the Product Owner(s) and how the team members can support the Product Owner and share the work.

And every Product Owner is themselves different and brings different skills, experience and insights to the role.

Job #1 for a newly appointed Product Owner is to sit down and decide what type of Product Owner they are expected to be and what type of Product Owner they want to be:

  • They may be a Backlog Administrator taking instructions from others.
  • They may be a Subject Matter Expert using their expert knowledge of the domain to decide what the right product to build is and help other team members understand the details of what is being built.
  • They may need to analyse internal process and business lines using the skills of Business Analysis.
  • They may need to get out on the road to meet customers – and potential customers – to understand the market and where the opportunities are using the skills of Product Management.
  • They may need to call on skills from other fields to: Project Management, Consulting and Entrepreneurship to name a few.

But a Product Owner is not some other things:

  • If they were a developer they need to accept they will not be coding any more. There simply isn’t time and anyway, they need to trust the team.
  • If they were a Project Manager, Development or Line Manager they need to resist any urge to tell people what to do or look too far into the future. They need to re-focus on value not time, and recognise that their authority comes from their competence not from a position on a chart.
  • Product Owners from a Business Analysis background need to look beyond Business Analysis, specifically they need to immerse themselves in the world of Product Management.
  • While Product Owners who were Product Managers probably have the easiest ride they too need to change, they need to think more about internal stakeholders, processes and delivery.

Every Product Owner and everyone working with Product Owners needs to read and reflect on the role. Hopefully some of the words in my recent posts – and the new book – will help with that – and hopefully some of you might like to hire me for advice or a training course – just call ?

Finally, I sincerely believe there are better Product Owners and not-so-good Product Owners, and that some organizations (teams, companies, enterprises) which offer a better environment for Product Ownership and equally there are those which are downright hostile to product ownership.

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post Closing the Product Owner mini-series: they are all different! appeared first on Allan Kelly Associates.

Do you still get the buzz? I do!


Whatever else I do to earn a living, I am a software engineer at the core. Outside of work other things give me a reason to smile - heavy metal bands, science fiction books or my family - but when it comes to work, writing software is what gives me the biggest buzz. Even after 33 years!

Recently I spent the weekend writing some software for a client. They have an app, which we built, that allows them to take photos and complete a questionnaire for installations so that they can record compliance. The software I wrote receives the photos and questionnaire responses from the app, generates a PDF document detailing the responses and attaches it to an email, along with the photos, to send to the client. Not a particularly exciting process most would agree.

It’s a straightforward piece of software (despite the security concerns and image processing which took a little while to get just right) which delivers exactly what the client needs, but we wanted to be doubly sure. So, in the early days of the software running for real (i.e. the client is using it, not just us testing it) we got copies of the emails the app generated so we could check everything was working as it should. And that’s where the buzz of being a software developer begins.

As developers, we’re not always able to monitor in what way or how frequently the software we write is being used by our clients. There are confidentiality issues to consider, as well as the practical aspects and cost concerns of implementing a suitable monitoring process. This means a lot of the time we rely on anecdotal responses from our clients, and of course feedback when something goes wrong (which thankfully, isn’t too often).

With this particular client we knew each and every time they used the software as an email would appear and we could see how the app was working until we, and they, were satisfied with the process. Even though it was such a simple thing, every time an email pinged through from the app I got a twinge of excitement and a flush of pride. To see something I’d created from scratch work successfully and be used by someone was a small but genuine reward for me and reminded me why I love doing what I do. The buzz of seeing software work.

What gives you that buzz every day and keeps you doing what you’re doing?

Type compatibility: name vs structural equivalence

What are the rules for deciding when two types are the same, or compatibility?

This question needs to be answered to decide whether an object of type T1 can be assigned to an object of type T2, whether they can be compared, added together, etc.

A wide collection of rules have been combined together, by various languages, for type compatibility of scalar types (e.g., integer, character, etc), but for aggregate types two rules dominate: name equivalence, and structural equivalence, or some combination.

With name equivalence, two types are the same if they are declared using the same name (e.g., the name of the tag for a union type, in C)

With structural equivalence, two types are compatible if they have a compatible structure, i.e., their internal contents are type compatible (this requires walking over each field/member checking that it is compatible). For instance, an object declared to have an aggregate type containing three integers is compatible with another aggregate type containing three integers (assuming any type modifiers, such as const’ness or mutability, are the same).

Structural compatibility becomes interesting when pointer types are involved; the pointed to types need to be checked and loops can occur, e.g., type S1 contains a field that has a type pointer to S2, which contains a field that has type pointer to S1.

While most types are easily checked for structural compatibility, every now and again aggregate types connect together in a way that makes it non-trivial to figure out which types are structurally compatible (dot file; needs graphviz):

C struct types in complicated cyclic relationship

Handling the edge cases requires maintaining a stack of information about which pairs of types are currently being compared.

In C, type compatibility is a combination of name equivalence (for aggregate types in the same translation unit) and structural equivalence (for function types and aggregate types across translation units).

Function types have to use structural equivalence because the type in a function definition is anonymous (the function name that appears in the definition has this anonymous type), there is no name to compare.

Cycles cannot appear in function types (in C), because the identifier being defined in a typedef is not in scope until just after the completion of its declarator. It is not possible to refer to the identifier being defined insider its own definition (e.g., it is not possible to define a function that takes its own type as a parameter; in typedef int (*f)(f); the second f is a redundant parameter name, the scope of the type denoted by the first f begins just before the semicolon).

Structural equivalence across translation units is a hangover from the early days of C, when developers were sloppy when using (or not) tag names (with different people having different rules for using upper/lower case tag names); developers’ knew what the layout in memory was and created the necessary types for their use of this data.

Type compatibility via name equivalence is easy to explain and makes it explicit when developers are bending the rules (i.e., pointer to struct casts appear in the code).

Type compatibility via structural equivalence is the wild west, which still exists in some development environments.

Enjoy a summer feast with the hottest tech community in The East


Join the Norwich tech community on Friday 27 July 2018 for a tasty BBQ to celebrate #NorfolkDay.

SyncNorwich, Norfolk Developers and Hot Source have got together to organise a delicious BBQ at the lovely Unthank Arms pub. (Don’t worry, we’re leaving it to the professionals to do the cooking.) You will be able to enjoy your meal in the garden – or the covered courtyard (if you need shelter or shade).

We look forward to seeing you for what is sure to be a fun evening from 18:00 to 22:00.

There will be a paying bar.

If your business would like to sponsor this event (particularly the drinks), please get in touch with the organisers.

Tickets cost just £18.92 per person – including transaction fees – but excluding drinks. In return, you’ll be able to choose from the following freshly cooked BBQ food.

Choose a main from either: Juicy homemade burger, Salmon parcels with lemon and herb butter, Archer’s award winning sausages, Lemon and thyme marinated chicken fillet, or Halloumi and bbq vegetables

Choose a side from either: Homemade coleslaw, New potato and spring onion salad, Tomato and red onion salad, or Cucumber and yoghurt raita.

Book now: https://summer-bbq-2018.eventbrite.co.uk

Breakfast with Peter Brady – CEO and Founder of Orbital Media



What: Breakfast with Peter Brady – CEO and Founder of Orbital Media
When: Tuesday 5th June
Where: The Maids Head Hotel, Tombland, Norwich, NR3 1LB
How much: £13.95
RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/qqwhznyxjbhb/

Peter Brady is CEO and founder of Orbital Media, one of the UK’s most innovative full service Digital Agencies. Founded in 2003, Orbital Media has worked with some of the world’s biggest organisations on national and global accounts (including Aviva, NHS, Sanofi, Nestle and Mitsubishi).

Although Orbital Media’s traditional focus was on projecting brand engagement and awareness through social and digital channels, over the last 10 years it has developed a strong innovation and technology focus, becoming the UK’s leading supplier of gamification apps to the healthcare industry.

Peter will talk about Orbital Media’s journey and how it has moved into large scale tech projects (including a collaborative Artificial Intelligence project with an NHS body and the University of Essex, to reduce the burden of minor ailments in primary care). Peter will also tell us about how Orbital Media is developing and exporting virtual / augmented / mixed reality projects in healthcare, education and pain therapy sectors into global markets.

A Measure Of Borel Weight – a.k.

In the last few posts we have implemented a type to represent Borel sets of the real numbers, which are the subsets of them that can be created with countable unions of intervals with closed or open lower and upper bounds. Whilst I would argue that doing so was a worthwhile exercise in its own right, you may be forgiven for wondering what Borel sets are actually for and so in this post I shall try to justify the effort that we have spent on them.

The Product Owner refactored: the SPO/TPO model

POrefactored-2018-05-2-16-02.jpg

Surprisingly I’ve never blogged about the Strategic Product Owner / Tactical Product Owner model, this is surprising because it is a model I both find again and again and advocate again and again.

I find lots of companies who have a version of this model in place, they have created the model to deal with their own situation. But few of these companies realise that this is a reoccurring solution and is quite legitimate. (I should write it up as a Pattern but I haven’t written any patterns for a while.)

More importantly I find that many companies and individuals faced with problems around Product Owners benefit from adopting this model. Specifically, as I’ve already mentioned there is a lot of work for a Product Owner to do and one way of doing this is to share the load.

If I were to write this up as a pattern the thumbnail version would say something like:

The Product Owner lacks the time – and sometimes skill – to fill the role fully therefore split the role in two. One person, the SPO (Strategic Product Owner), looks long term, they focus on customers and strategy. The other, the TPO (Tactical Product Owner), focuses on the near term (this sprint, the next sprint, the next quarter). The TPO spends most of their time with the delivery team while the SPO spends most of their time with customers and senior stakeholders.

Sometimes the Product Owner lacks time simply because – as I’ve said before – there is so much work the Product Owner should be doing they simply don’t have time.

Sometimes they lack time because the team is large, or the team lack domain knowledge (and therefore need to ask the PO lots of questions). Sometimes POs need to travel a lot to meet customers and even the most talented PO can’t be in two places at once.

They may also lack time because they have another job to do. While I think the Product Owner role is a full time job sometimes the person who is the right person to hold the role – usually because they command authority – needs to combine the work with another role.

For example, on a trading desk the Product Owner should probably be a senior trader who both knows the domain and has the authority to say Yes and No to features. But by definition such a person lacks time. Normally I’d want a dedicated Product Owner in place but sometimes the only way to have the necessary authority is to have another job.

And sometimes the person who is should be Product Owner – think our trader again – lacks the skills and experience to do the role. So again they need help.

The key thing about the SPO/TPO model is that the two people who hold the role need to speak with one voice. If they do not then the model will fail. Ideally the SPO will stand in when the TPO is unavailable and vice verse.

There is another occasion when the SPO/TPO model can be useful: big teams.

SPOManyTPO-2018-05-2-16-02.jpg

Ideally there is one product owner, one team and one stream of work. But sometimes there are several products, teams and streams. Here you might have an SPO who looks at the long term and several TPOs each of whom works with one team on one stream.

Now, like all good patterns this one is not without its downsides…

I’ve heard Scrum-advocates argue against this model: One True Product Owner they say. And they have a point… putting more people between the delivery team and the customer does detract from communication.

One of the problems software development faces is when multiple people think they have the right to say what is built next. Another problem occurs when the customer is remote from the development team and multiple people mediate what is asked for.

Ideally developers can talk to customers directly but that is often not possible or desirable – I won’t go into the reasons right now. So a good solution is One True Product Owner.

But then the One True Product Owner becomes a bottleneck so we split the role SPO/TPO. Yet every-time we introduce another link – another person – between the coders and the customer the greater the propensity to introduce problems. So it becomes a balancing act.

Nobody in between is the can be ideal.

One person can make it better.

Two people can be an improvement over one.

Three… I need some convincing this is an improvement over two.

Four… I find it hard to believe that having four people mediate the voice of the customer is an improvement… unless of course you previously had five!

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post The Product Owner refactored: the SPO/TPO model appeared first on Allan Kelly Associates.

Replication: not always worth the effort

Replication is the means by which mistakes get corrected in science. A researcher does an experiment and gets a particular result, but unknown to them one or more unmeasured factors (or just chance) had a significant impact. Another researcher does the same experiment and fails to get the same results, and eventually many experiments later people have figured out what is going on and what the actual answer is.

In practice replication has become a low status activity, journals want to publish papers containing new results, not papers backing up or refuting the results of previously published papers. The dearth of replication has led to questions being raised about large swathes of published results. Most journals only published papers that contain positive results, i.e., something was shown to some level of statistical significance; only publishing positive results produces publication bias (there have been calls for journals that publishes negative results).

Sometimes, repeating an experiment does not seem worth the effort. One such example is: An Explicit Strategy to Scaffold Novice Program Tracing. It looks like the authors ran a proper experiment and did everything they are supposed to do; but, I think the reason that got a positive result was luck.

The experiment involved 24 subjects and these were randomly assigned to one of two groups. Looking at the results (figures 4 and 5), it appears that two of the subjects had much lower ability that the other subjects (the authors did discuss the performance of these two subjects). Both of these subjects were assigned to the control group (there is a 25% chance of this happening, but nobody knew what the situation was until the experiment was run), pulling down the average of the control, making the other (strategy) group appear to show an improvement (i.e., the teaching strategy improved student performance).

Had one, or both, low performers been assigned to the other (strategy) group, no experimental effect would have shown up in the results, significantly reducing the probability that the paper would have been accepted for publication.

Why did the authors submit the paper for publication? Well, academic performance is based on papers published (quality of journal they appear in, number of citations, etc), a positive result is reason enough to submit for publication. The researchers did what they have been incentivized to do.

I hope the authors of the paper continue with their experiments. Life is full of chance effects and the only way to get a solid result is to keep on trying.

ACCU Conference 2018

We had an absolute blast at this year's ACCU Conference, and if you were there we imagine you did too. For us the highlight had to be the launch of #include <C++>, a new global, inclusive, and diverse community for developers interested in C++.
#include C++ logo
This is an initiative that has been brewing for a while, and we're very happy to be a part of. Above all else #include <C++> is designed to be a safe place for developers irrespective of their background, ethnicity, gender identity or sexuality. The group runs a Discord server which is moderated to ensure that it remains a safe space and which you are welcome to join.
On the technical front, one unexpected highlight of the conference was Benjamin Missel's wonderful short talk on writing a C compiler for a BBC Micro, during which he demonstrated SSHing into a BBC Model B through the serial port! Most conference sessions were recorded so even if you weren't there you can still watch them. See you at ACCU 2019!

ResOrg 2.0.7.27 has been released

ResOrg 2.0.7.27 has just been released. This is a maintenance update for ResOrg 2.0, and is compatible with all ResOrg 2.0 licence keys. The following changes are included:
  • Fixed a bug in the Symbols Display which could cause some "OK" symbols to be incorrectly shown in the "Problem Symbols Only" view.
  • Corrected the upper range limit for control symbols from 28671 (0x6FFF) to 57343 (0xDFFF).
  • ResOrg binaries are now dual signed with both SHA1 and SHA256.
  • Added support for Visual Studio 2017.
  • Corrected the File Save Dialog filters used by the ResOrgApp "File | Export" command.
  • The ResOrgApp "File | Export", "File | Save", "File | Save As" and "File | Properties" commands (which apply only to symbol file views) are now disabled when the active view is a report.
  • Fixed a crash in the Symbol File Properties Dialog.
  • Fixed a typo on the Symbol File "Next Values" page.
  • Various minor improvements to the installer.
Download ResOrg 2.0.7.27

The Product Owner is dead, long live the Product Owner!

3ProductOwners-2018-04-26-17-33.jpg

For years I have been using this picture to describe the Product Owner role. For years I have been saying:

“The title Product Owner is really an alias. Nobody should have Product Owner on their business cards. Product Owner is a Scrum defined role which is usually filled by a Product Manager or Business Analyst, sometimes it is filled by a Domain Expert (also known as a Subject Matter Expert, SME) and sometimes by someone else.”

Easy right?

In telling us about the Product Owner Scrum tells us what one of these people will be doing within the Scrum setting. Scrum doesn’t tell us how the Product Owner knows what they need to know to make those decisions – that comes by virtue of the fact that underneath they are really a Product Manager, BA or expert in the field.

In the early descriptions of Scrum there was a tangible feel that the Product Owner really had the authority to make decisions – they were the OWNER. I still hope that is true but more often than not these days the person playing Product Owner is more likely to be a proxy for one or more real customers.

I go on to say:

“In a software company, like Microsoft or Adobe, Product Managers normally fill the role of Product Owner. The defining feature of the Product Manager role is that their customers are not in the building. The first task facing a new Product Manager is to work out who their customers are – or should be – and then get out to meet them. By definition customers are external.”

“Conversely in a corporate setting, like HSBC, Lufthansa, Proctor and Gamble, a Product Owner is probably a Business Analyst. There job is to analyse some aspect of the business and make it better. By definition their customers are in the building.”

With me so far?

Next I point out that having set up this nice model these roles are increasingly confused because software product companies increasingly sell their software as a service. And corporate customer interact with their customers online, which means customer contact is now through the computer.

Consider the airline industry: twenty years ago the only people who interacted with airline systems from United, BA, Lufthansa, etc. were airline employees. If you wanted to book a flight you went to a travel agent and a nice lady used a green screen to tell you what was available.

Today, whether you book with Lufthansa, SouthWest or Norwegian may well come down to which has the best online booking system.

Business Analyst need to be able to think like Product Managers and Product Managers need to be able to think like Business Analysts.

I regularly see online posts proclaiming “Product Managers are not Product Owners” or “Business Analysts are not Product Owners.” I’ve joined in with this, my alias argument says “they might be but there is so much more to those roles.”

It makes me sad to see the Product Manager role reduced to a Product Owner: the Product Owner role as defined by Scrum is a mere shadow of what a good Product Manager should be.

But the world has moved on, things have changed.

The world has decided that Product Owner is the role, the person who deals with the demand side, the person decides what is needed and what is to be built.

I think its time to change my model. The collision between the world of Business Analysts and Product Managers is now complete. The result is an even bigger mess and a new role has appeared: “Digital Business Analyst” – the illegitimate love child of Business Analysis and Product Management.

The Product Owner is now a superset of Product Manager and Business Analyst.

ProductOwnerSkills-2018-04-26-17-33-1.jpg

Product Owners today may well need the skills of business analysis. They are even more likely to need the skills of Product Management. And they are frequently expected to know about the domain.

Today’s Product Owner may well have a Subject Matter Expert background, in which case they quickly need to learn about Product Ownership, Product Management and Business Analysis.

Or they may have a Business Analysis background and need to absorb Product Management skills. Conversely, Product Owners may come from a Product Management background and may quickly need to learn some Business Analysis. In either case they will learn about the domain but they may want to bring in a Subject Matter Expert too.

To make things harder, exactly which skills they need, and which skills are most important is going to vary from team to team and role to role.

The post The Product Owner is dead, long live the Product Owner! appeared first on Allan Kelly Associates.

Another way to use Emacs to convert DOS/Unix line endings

I’ve previously blogged about using Emacs to convert line endings and use it as an alternative to the dos2unix/unix2dos tools. Using set-buffer-file-coding-system works well and has been my go-to conversion method. That said, there is another way to do the same conversion by using M-x recode-region. As the name implies, recode-region works on a region. […]

The post Another way to use Emacs to convert DOS/Unix line endings appeared first on The Lone C++ Coder's Blog.

Custom Alias Analysis in LLVM

At Codeplay I currently work on a compiler backend for an embedded accelerator. The backend is the part of the compiler which takes some representation of the source code and translates it to machine code. In my case I’m working with LLVM, so this representation is LLVM IR.

It’s very common for accelerators like GPUs to have multiple regions of addressable memory – each with distinct properties. One important optimisation I’ve implemented recently is extending LLVM’s alias analysis functionality to handle the different address spaces for our target architecture.

This article will give an overview of the aim of alias analysis – along with an example based around address spaces – and show how custom alias analyses can be expressed in LLVM. You should be able to understand this article even if you don’t work with compilers or LLVM; hopefully it gives some insight into what compiler developers do to make the tools you use generate better code. LLVM developers may find the implementation section helpful, but may want to read the documentation or examples linked at the bottom for more details.

What is alias analysis

Alias analysis is the process of determining whether two pointers can point to the same object (alias) or not. This is very important for some valuable optimisations.

Take this C function as an example:

int foo (int __attribute__((address_space(0)))* a,
         int __attribute__((address_space(1)))* b) {
    *a = 42;
    *b = 20;
    return *a;
}

Those __attribute__s specify that a points to an int in address space 0, b points to an int in address space 1. An important detail of the target architecture for this code is that address spaces 0 and 1 are completely distinct: modifying memory in address space 0 can never affect memory in address space 1. Here’s some LLVM IR which could be generated from this function:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  %0 = load i32, i32* %a, align 4
  ret i32 %0
}

For those unfamiliar with LLVM IR, the first store is storing 42 into *a, the second storing 20 into *b. The %0 = ... line is like loading *a into a temporary variable, which is then returned in the final line.

Optimising foo

Now we want foo to be optimised. Can you see an optimisation which could be made?

What we really want is for that load from a (the line beginning %0 = ...) to be removed and for the final statement to instead return 42. We want the optimised code to look like this:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  ret i32 42
}

However, we have to be very careful, because this optimisation is only valid if a and b do not alias, i.e. they must not point at the same object. Forgetting about the address spaces for a second, consider this call to foo where we pass pointers which do alias:

int i = 0;
int result = foo(&i, &i);

Inside the unoptimised version of foo, i will be set to 42, then to 20, then 20 will be returned. However, if we carry out desired optimisation then the two stores will occur, but 42 will be returned instead of 20. We’ve just broken the behaviour of our function.

The only way that a compiler can reasonably carry out the above optimisation is if it can prove that the two pointers cannot possibly alias. This reasoning is carried out through alias analysis.

Custom alias analysis in LLVM

As I mentioned above, address spaces 0 and 1 for our target architecture are distinct. However, this may not hold for some systems, so LLVM cannot assume that it holds in general: we need to make it explicit.

One way to achieve this is llvm::AAResultBase. If our target is called TAR then we can create a class called TARAAResult which inherits from AAResultBase<TARAAResult>1:

class TARAAResult : public AAResultBase<TARAAResult> {
public:
  explicit TARAAResult() : AAResultBase() {}
  TARAAResult(TARAAResult &&Arg) : AAResultBase(std::move(Arg)) {}

  AliasResult alias(const MemoryLocation &LocA, const MemoryLocation &LocB);
};

The magic happens in the alias member function, which takes two MemoryLocations and returns an AliasResult. The result indicates whether the locations can never alias, may alias, partially alias, or precisely alias. We want our analysis to say “If the address spaces for the two memory locations are different, then they can never alias”. The resulting code is surprisingly close to this English description:

AliasResult TARAAResult::alias(const MemoryLocation &LocA,
                               const MemoryLocation &LocB) {
  auto AsA = LocA.Ptr->getType()->getPointerAddressSpace();
  auto AsB = LocB.Ptr->getType()->getPointerAddressSpace();

  if (AsA != AsB) {
    return NoAlias;
  }

  // Forward the query to the next analysis.
  return AAResultBase::alias(LocA, LocB);
}

Alongside this you need a bunch of boilerplate for creating a pass out of this analysis (I’ll link to a full example at the end), but after that’s done you just register the pass and ensure that the results of it are tracked:

void TARPassConfig::addIRPasses() {
  addPass(createTARAAWrapperPass());
  auto AnalysisCallback = [](Pass &P, Function &, AAResults &AAR) {
    if (auto *WrapperPass = P.getAnalysisIfAvailable<TARAAWrapper>()) {
      AAR.addAAResult(WrapperPass->getResult());
    }
  }; 
  addPass(createExternalAAWrapperPass(AliasCallback));
  TargetPassConfig::addIRPasses();
}

We also want to ensure that there is an optimisation pass which will remove unnecessary loads and stores which our new analysis will find. One example is Global Value Numbering.

  addPass(createNewGVNPass());

After all this is done, running the optimiser on the LLVM IR from the start of the post will eliminate the unnecessary load, and the generated code will be faster as a result.

You can see an example with all of the necessary boilerplate in LLVM’s test suite. The AMDGPU target has a full implementation which is essentially what I presented above extended for more address spaces.

Hopefully you have got a taste of what alias analysis does and the kinds of work involved in writing compilers, even if I have not gone into a whole lot of detail. I may write some more short posts on other areas of compiler development if readers find this interesting. Let me know on Twitter or in the comments!


  1. This pattern of inheriting from a class and passing the derived class as a template argument is known as the Curiously Recurring Template Pattern

A Python Data Science hackathon

I was at the Man AHL Hackathon this weekend. The theme was improving the Python Data Science ecosystem. Around 15, or so, project titles had been distributed around the tables in the Man AHL cafeteria and the lead person for each project gave a brief presentation. Stable laws in SciPy sounded interesting to me and their room location included comfy seating (avoiding a numb bum is an under appreciated aspect of choosing a hackathon team and wooden bench seating is not numbing after a while).

Team Stable laws consisted of Andrea, Rishabh, Toby and yours truly. Our aim was to implement the Stable distribution as a Python module, to be included in the next release of SciPy (the availability had been announced a while back and there has been one attempt at an implementation {which seems to contain a few mistakes}).

We were well-fed and watered by Man AHL, including fancy cream buns and late night sushi.

A probability distribution is stable if the result of linear combinations of the distribution has the same distribution; the Gaussian, or Normal, distribution is the most well-known stable distribution and the central limit theorem leads many to think, that is that.

Two other, named, stable distributions are the Cauchy distribution and most interestingly (from my perspective this weekend) the Lévy distribution. Both distributions have very fat tails; the mean and variance of the Cauchy distribution is undefined (i.e., the values jump around as the sample size increases, never converging to a fixed value), while they are both infinite for the Lévy distribution.

Analytic expressions exist for various characteristics of the Stable distribution (e.g., probability distribution function), with the Gaussian, Cauchy and Lévy distributions as special cases. While solutions for implementing these expressions have been proposed, care is required; the expressions are ill-behaved in different ways over some intervals of their parameter values.

Andrea has spent several years studying the Stable distribution analytically and was keen to create an implementation. My approach for complicated stuff is to find an existing implementation and adopt it. While everybody else worked their way through the copious papers that Andrea had brought along, I searched for existing implementations.

I found several implementations, but they all suffered from using approaches that delivered discontinuities and poor accuracies over some range of parameter values.

Eventually I got lucky and found a paper by Royuela-del-Val, Simmross-Wattenberg and Alberola-López, which described their implementation in C: Libstable (licensed under the GPL, perfect for SciPy); they also provided lots of replication material from their evaluation. An R package was available, but no Python support.

No other implementations were found. Team Stable laws decided to create a new implementation in Python and to create a Python module to interface to the C code in libstable (the bit I got to do). Two implementations would allow performance and accuracy to be compared (accuracy checks really need three implementations to get some idea of which might be incorrect, when output differs).

One small fix was needed to build libstable under OS X (change Makefile to link against .so library, rather than .a) and a different fix was needed to install the R package under OS X (R patch; Windows and generic Unix were fine).

Python’s ctypes module looked after loading the C shared library I built, along with converting the NumPy arrays. My PyStable module will not win any Python beauty contest, it is a means of supporting the comparison of multiple implementations.

Some progress was made towards creating a new implementation, more than 24 hours is obviously needed (libstable contains over 4,000 lines of code). I had my own problems with an exception being raised in calls to stable_pdf; libstable used the GNU Scientific Library and I tracked the problem down to a call into GSL, but did not get any further.

We all worked overnight, my first 24-hour hack in a very long time (I got about 4-hours sleep).

After Sunday lunch around 10 teams presented and after a quick deliberation, Team Stable laws were announced as the winners; yea!

Hopefully, over the coming weeks a usable implementation will come into being.

On Quaker’s Dozen – student

The Baron's latest wager set Sir R----- the task of rolling a higher score with two dice than the Baron should with one twelve sided die, giving him a prize of the difference between them should he have done so. Sir R-----'s first roll of the dice would cost him two coins and twelve cents and he could elect to roll them again as many times as he desired for a further cost of one coin and twelve cents each time, after which the Baron would roll his.
The simplest way to reckon the fairness of this wager is to re-frame its terms; to wit, that Sir R----- should pay the Baron one coin to play and thereafter one coin and twelve cents for each roll of his dice, including the first. The consequence of this is that before each roll of the dice Sir R----- could have expected to receive the same bounty, provided that he wrote off any losses that he had made beforehand.

Emacs 26.1-RC1 on the Windows Subsystem for Linux

As posted in a few places, Emacs 26.1-RC1 has been released. Following up my previous experiments with running Emacs on the Windows Subsystem for Linux, I naturally had to see how the latest version would work out. For that, I built the RC1 on an up-to-date Ubuntu WSL. I actually built it twice – once […]

The post Emacs 26.1-RC1 on the Windows Subsystem for Linux appeared first on The Lone C++ Coder's Blog.

Influential philosophers of source code

Who is the most important/influential philosopher of source code? Source code, as far as I know, is not a subject that philosophers claim to be studying; but, the study of logic, language and the mind is the study of source code.

For many, Ludwig Wittgenstein would probably be the philosopher that springs to mind. Wittgenstein became famous as the world’s first Perl programmer, with statements such as: “If a lion could talk, we could not understand him.” and “Whereof one cannot speak, thereof one must be silent.”

Noam Chomsky, a linguist, might be another choice, based on his specification of the Chomsky hierarchy (which neatly categorizes grammars). But generative grammars (for which he is famous in linguistics) is about generating language, not understanding what has been said/written.

My choice for the most important/influential philosopher of source code is Paul Grice. A name, I suspect, that is new to most readers. The book to quote (and to read if you enjoy the kind of books philosophers write) is “Studies in the Way of Words”.

Grice’s maxims, provide a powerful model for human communication; the tldr:

  • Maxim of quality: Try to make your contribution one that is true.
  • Maxim of quantity: Make your contribution as informative as is required.
  • Maxim of relation: Be relevant.

But source code is about human/computer communication, you say. Yes, but so many developers seem to behave as-if they were involved in human/human communication.

Source code rarely expresses what the developer means; source code is evidence of what the developer means.

The source code chapter of my empirical software engineering book is Gricean, with a Relevance theory accent.

More easily digestible books on Grice’s work (for me at least) are: “Relevance: Communication and Cognition” by Sperber and Wilson, and the more recent “Meaning and Relevance” by Wilson and Sperber.

What Product Owners should not do

Noproductowners-2018-04-18-11-27.jpg

Last time I set out some of the things a Product Owner should be doing – or at least considering doing. Even a quick look at that list will tell you the Product Owner is going to be a busy person.

So in this post I’d like to suggest some thinigs Product Owners should NOT be doing.

Product Owners Cutting code should NOT be cutting code

Having a former coder in the Product Owner role can be a great boom. Not only do they know how to talk with the technical team and (hopefully) can command their respect but they can also see how technology can apply.

But to be an effective Product Owner they need to step away from the keyboard and stop writing code.

Two reasons.

One: time.
Product Owners add value by ensuring that the code which is written addresses the most valuable opportunities with the smallest, most elegant, delightful way possible.

Every minute spent coding is a minute not doing that.

Second: Product Owners need to empathise with the customer, with the business users, they need to eat-sleep-and-breath customers.

Being a good coder – let alone someone called an architect – is to empathise with code, the system, the mechanics of how a system works.

Importantly both requirements and code need to be able to come together and discuss what they see and find a way to bring the two – sometimes opposing – views together. It is a lot easier to have that discussion if the two sides are represented by different people.

Asking one person to divide their brain in two and discuss opposing views with themselves is unlikely to bring about the best result and is probably a recipe for confusion and stress.

Thats not to say both sides shouldn’t appreciate the other. I said before, former coders have a great advantage in being a Product Owner. And I want the technical team to meet customers. But I want discussions to be between two (or more) people.

(I might allow an exception here for Minimally Viable Teams but once the team moves beyond the MVT stage the PO should stop coding.)

Product Owners should NOT be line managers

OK, senior Product Owners should might line manage junior product owners but they certainly should not be line managing anyone else. Most certainly they should not be line managing the technical team.

Product Owner authority comes not from a line on an organization chart, or the ability to award (or deny) a pay rise or bonus. Product Owner authority stems from their specialist knowledge of what customers want from a product and what the organization considers valuable.

If the Product Owner cannot demonstrate their specialist knowledge in this way then either they should learn fast or they should consider if they are in the right role.

Product Owners need to trust the technical team and the technical team need to trust the Product Owner. Authority complicates this relationship because one side is allowed to issue orders when trust is absent and the other side has to obey.

And again, Product Owner simply don’t have the time to line manage anyone.

Being a good line manager requires empathy with employees and time to spend observing and talking to employees, helping them develop themselves, helping them with problems and so on.

Product Owners should not Make Promises for Other People to keep

Specifically that means they should not issue “Roadmaps” which list features with delivery dates based on effort estimates. The whole issue of estimation is a minefield, very few teams are in a position to estimate accurately and most humans are atrocious at time estimation anyway. So any plans based on effort estimation are a fantasy anyway. But even putting that to one side…

Issuing such plans commits other people to keep promises. That is just unfair.

Product Owners can create and share scenario plans about how the product – and world – might unfold in the future.

Product Owners can co-create and share capacity plans which should how an organization intends (strategically) to allocate resources. And Product Owners can work with teams in executing against those capacity plans in order to deliver functionality the Product Owner thinks should be delivered by a date the Product Owner thinks is necessary.

In other words: provided a Product Owner is making the promise that they intend to keep themselves (i.e. they have skin in the game) then they might issue some kind of forward plan.

Product Owners should dump outbound marketing at the first opportunity

Outbound marketing, e.g. advertising, press releases, public relations and product evangelism, often ends up on the Product Owner plate – particularly when the Product Owner is a Product Manager. And in a small company (think early stage start-up) this just needs to be accepted.

However, in a larger organization, or a growing start-up, Product Owners should seek to pass this work to a dedicated Product Marketing specialist as soon as possible. Both roles deserve enough time to do the job properly.

The Marketing Specialist and Product Owner will work closely together – they are after all two sides of the same coin, the Marketing coin. The Marketing Specialist handles outbound marketing (telling people about the product) and the Product Owner handles inbound marketing (what do people want from the product?). (Again, in organizations with established Product Management this is usually easier to see.)

Product Owners should dump pre-sales at the first opportunity

As with outbound marketing Product Owners often get dragged in as pre-sales support to account managers. And again this is more common in small companies and early stage start-ups.

There are some advantages to playing second fiddle to a sales person. The Product Owner might get actual customer contact (sales people too often block Product people from meeting customers.) And Product Owners should be exposed to some of the commercial pressures that sales people – and customers – encounter.

But doing pre-sales is very time consuming. And being wheeled in to help deliver a sales will distort the Product Owner’s view of the market – just ‘cos this customer wants the product in Orange doesn’t mean other customers want Orange.

And again, pre-sales is more effectively done by specialist staff as soon as the company can afford them.

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post What Product Owners should not do appeared first on Allan Kelly Associates.

The C++ committee has taken off its ball and chain

A step change in the approach to updates and additions to the C++ Standard occurred at the recent WG21 meeting, or rather a change that has been kind of going on for a few meetings has been documented and discussed. Two bullet points at the start of “C++ Stability, Velocity, and Deployment Plans [R2]”, grab reader’s attention:

● Is C++ a language of exciting new features?
● Is C++ a language known for great stability over a long period?

followed by the proposal (which was agreed at the meeting): “The Committee should be willing to consider the design / quality of proposals even if they may cause a change in behavior or failure to compile for existing code.”

We have had 30 years of C++/C compatibility (ok, there have been some nibbling around the edges over the last 15 years). A remarkable achievement, thanks to Bjarne Stroustrup over 30+ years and 64 full-week standards’ meetings (also, Tom Plum and Bill Plauger were engaged in shuttle diplomacy between WG14 and WG21).

The C/C++ superset/different issue has a long history.

In the late 1980s SC22 (the top-level ISO committee for programming languages) asked WG14 (the C committee) whether a standard should be created for C++, and if so did WG14 want to create it. WG14 considered the matter at its April 1989 meeting, and replied that in its view a standard for C++ was worth considering, but that the C committee were not the people to do it.

In 1990, SC22 started a study group to look into whether a working group for C++ should be created and in the U.S. X3 (the ANSI committee responsible for Information processing systems) set up X3J16. The showdown meeting of what would become WG21, was held in London, March 1992 (the only ISO C++ meeting I have attended).

The X3J16 people were in London for the ISO meeting, which was heated at times. The two public positions were: 1) work should start on a standard for C++, 2) C++ was not yet mature enough for work to start on a standard.

The, not so public, reason given for wanting to start work on a standard was to stop, or at least slow down, changes to the language. New releases, rumored and/or actual, of Cfront were frequent (in a pre-Internet time sense). Writing large applications in a version of C++ that was replaced with something sightly different six months later was has developers in large companies pulling their hair out.

You might have thought that compiler vendors would be happy for the language to be changing on a regular basis; changes provide an incentive for users to pay for compiler upgrades. In practice the changes were so significant that major rework was needed by somebody who knew what they were doing, i.e., expensive people had to be paid; vendors were more used to putting effort into marketing minor updates. It was claimed that implementing a C++ compiler required seven times the effort of implementing a C compiler. I have no idea how true this claim might have been (it might have been one vendor’s approximate experience). In the 1980s everybody and his dog had their own C compiler and most of those who had tried, had run into a brick wall trying to implement a C++ compiler.

The stop/slow down changing C++ vs. let C++ “fulfill its destiny” (a rallying call from the AT&T rep, which the whole room cheered) finally got voted on; the study group became a WG (I cannot tell you the numbers; the meeting minutes are not online and I cannot find a paper copy {we had those until the mid/late-90s}).

The creation of WG21 did not have the intended effect (slowing down changes to the language); Stroustrup joined the committee and C++ evolution continued apace. However, from the developers’ perspective language change did slow down; Cfront changes stopped because its code was collapsing under its own evolutionary weight and usable C++ compilers became available from other vendors (in the early days, Zortech C++ was a major boost to the spread of usage).

The last WG21 meeting had 140 people on the attendance list; they were not all bored consultants looking for a creative outlet (i.e., exciting new features), but I’m sure many would be happy to drop the ball-and-chain (otherwise known as C compatibility).

I think there will be lots of proposals that will break C compatibility in one way or another and some will it into a published standard. The claim will be that the changes will make life easier for future C++ developers (a claim made by proponents of every language, for which there is zero empirical evidence). The only way of finding out whether a change has long term benefit is to wait a long time and see what happens.

The interesting question is how C++ compiler vendors will react to breaking changes in the language standard. There are not many production compilers out there these days, i.e., not a lot of competition. What incentive does a compiler vendor have to release a version of their compiler that will likely break existing code? Compiler validation, against a standard, is now history.

If WG21 make too many breaking changes, they could find C++ vendors ignoring them and developers asking whether the ISO C++ standards’ committee is past its sell by date.

"The answer was always ‘yes’ with Naked Element"


Sometimes it is difficult to explain exactly how we are different from our competitors and what working with us is like, so we put together a short video that says it all!

Some of our lovely clients have kindly shared how they felt about working with Naked Element, the results they saw and the impact our software had on their business. In less than three minutes it is clear why we get such good feedback from our clients. Our specialised way of working and personal approach makes a big difference, and our understanding of each client's needs is obvious from the finished product.

As CEO Paul Grenyer says, we are driven by our clients, and that, combined with our years of development experience, means that we have helped companies large and small overcome processing issues.

But don't take our word for it! Take a look at our video and what our clients have to say about us, and get in touch if you think we can help you!

Busy busy busy: What Product Owners do

HeadacheiStock_000014496990Small-2018-04-10-10-18.jpg

If you hadn’t noticed I’m building a blog mini-series on the Product Owner role. Its a role I’ve long felt didn’t get the attention it should have. Frankly, in a Scrum setting, I think the Scrum Master gets too much attention and the Product Owner not enough.

One aspect in particular of the Product Owner role really annoys me: they have so much work to do.

Or rather, a Product Owners who is doing their job properly – as opposed to simply administering the backlog – has so many things they should potentially be doing.

So a few days ago I started to make a list…

Backlog administration: writing stories, reviewing and discussing suggested stories, splitting stories, weeding the backlog (throwing stories away), improving stories, putting value on stories, writing acceptance criteria

Working with the team: talking to the stories, reviewing work in progress, reviewing “completed” work, potentially signing-off or formally accepting stories, participating in 3-Amigos meetings with testers and developers, helping to improve the development processes

UXD: working even more closely with an UXD specialists because the two roles overlap, and possibly substituting for UXD specialists where they are absent.

Meetings: prioritisation pre-planning meeting, planning meeting themselves, stand-up meetings, retrospectives, show & tell demonstrations (potentially delivering them the show & tell themselves)

Interfacing to the wider organization: reporting and listening to internal stakeholders in authority, attending Governance and/or Portfolio review meetings, aligning product strategy and plans with company strategy and plans, plus feeding back to company strategy about their own product strategy and plans.

Planning: participating in Sprint planning with the team, planning for upcoming iterations (the rolling quarter plan as I like to call it), longer term planning which might take the form of a roadmap, a capacity plan, a scenario plan or all three

Customers 1: identifying customers and potential customer, segmenting the customer base, creating customer profiles and personas.

Customers 2: visiting customers, observing customers, talking to customers about stories and potential future work, reflecting on customer comments and feeding back to the team and other stakeholders.

Customers 3: similar activities to #2 for people and organizations who are not currently customer but who are potential customers (because potential customers who have unmet needs represent growth.)

I’m sure some of you are saying: “But we don’t have external customers, we have internal (captive) users”. And your right, if you have such “customers” then you have a subset of these activities. But then again, shouldn’t you be thinking about how our product is used by internal users to service the needs of external customers? And how you could improve that experience (for the customers) and improve the process (for the users?)

Marketing: inbound marketing the items just mentioned under customers plus market scanning (checking out the competitors) and potentially outbound marketing (advertising, PR, trade shows, etc.)

Sharing expert knowledge: providing knowledge about the domain and subject of development to the development team, supporting sales calls, demonstrating the product at shows. (And when the company is small helping the training and support teams.)

The offering: using the information gained in all these activities to refine the product/service offering to satisfy customers or improve business processes; Is it the right offering? Are you targeting the right customer segment? Should you be offering something else?

Close the loop: evaluating the effect on customers and/or process: Are the features bing used? Are non-feature improvements making a difference? What shouldn’t have been done? What arises form the changes that have been made? More software changes? Process changes?

Money: is all this making money? if the continued existence of the team positive to ROI?

Coincidentally, while I was preparing this blog Marty Cagan published a blog entitled “CEO of the Product Revisited” in which he discussed offered a list of all the discussions a Product Manager can expect to be involved with. That is no short list either. And as anyone who follows my writing already knows I see the Product Owner role as a kind-of Product Manager – more on that in a future blog.

This is not to say that all Product Owners should be doing all of these things. Asking one person to take all this on is probably setting them up to fail. Every product owner should recognise every item on this list. If they aren’t doing any of these items themselves then I expect they can either cross it off (doesn’t need doing where they work), or name the person who is doing it.

And I also expect every product owner can add some things to this list which I have overlooked.

In future blog posts I intend to discuss (again) the Product Owner as a Product Manager and how Product Owners can reduce their work load.

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post Busy busy busy: What Product Owners do appeared first on Allan Kelly Associates.

A Borel Universe – a.k.

Last time we took a look at Borel sets of real numbers, which are subsets of the real numbers that can be represented as unions of countable sets of intervals Ii. We got as far as implementing the ak.borelInterval type to represent an interval as a pair of ak.borelBound objects holding its lower and upper bounds.
With these in place we're ready to implement a type to represent Borel sets and we shall do exactly that in this post.

East End Functions

There has been a recent stirring of attention, in the C++ community, for the practice of always placing the const modifier to the right of the thing it modifies. The practice has even been gifted a catchy name: East Const (which, I think, is what has stirred up the interest).

As purely a matter of style it's fascinating that it seems to have split the community so strongly! There are cases for and against, but both sides seem to revolve around the idea of "consistency". For the East Const believers the consistency is in the sense that you can always apply one, simple, rule about what const means and where it goes. For the West Consters the consistency is with the majority of existing code out there - as well as the Core Guidelines recommendation!

Personally I've been an East Const advocate for many years (although not by that name, of course) - and converted the entire Catch codebase over to East Const quite early on.

But there's another style choice that I've not seen discussed quite as much, but has a number of parallels.

As with East vs West Const this is purely a matter of style (it doesn't change what the compiler generates), and one of the arguments in favour is consistency across application (there are some cases where you must do it this way) - but the main argument against is also consistency - with most existing code. Sound familiar? But what is it?

The issue is about where to specify return types on function signatures. For most of C++'s history the only choice has been to write the type before the name of the function (along with any modifiers). But since C++11 we've been able to write the type at the end of the function signature (after a -> - and the function must be prefixed with the keyword auto).

auto someFunc( int i ) -> std::string;
// instead of
std::string someFunc( int i );

So why would you prefer this style? Well, first there's that consistency argument. It's the only way to specify return types for lambdas. You're also required to use trailing return types if the type is a decltype that is dependent on the name of one of the function's arguments. Indeed, that's the motivating case for adding the syntax in the first place. e.g.:

template <typename Lhs, typename Rhs>
auto add( Lhs const& lhs, Rhs const& rhs ) -> decltype( lhs + rhs ) {
    return lhs + rhs;
}

A Foolish Consistency?

Given those cases where it is required, using the same syntax in all other cases would seem to be more consistent.

I'm not sure the consistency argument is as strong here as it is with East Const - there was never much confusion over what the return applied to, after all. But I think it's worth keeping in mind.

The next reason for is consistency with other languages. Many languages, especially functional programming languages, exclusively use the trailing syntax for return types. Quite a few, e.g. Swift, use the same -> syntax.

It's not a strong reason on its own, but combined with the internal consistency argument I think there's something there.

However, for me at least, the most compelling rationale is for readability. Why do I think it's more readable? There are actually two parts to this:

  1. Function declarations tend to line up. Certain qualifiers might spoil this effect, although one approach might be to group similarly qualified functions (e.g. all virtuals) together. This makes glancing through the list of function names much easier.

  2. The name of the function is usually the most important thing when you're browsing the code. If you're more interested in the return type it's usually because you already know which function you're interested in. So making the name the first thing you read (after the auto introducer) seems fitting.

auto doesItBlend() -> bool;
auto whatsYourFavouriteNumber() -> int;
auto add( double a, double b ) -> double;
void setTheControls();

(note that many who prefer this form, including myself, tend to still put void first)

For me the arguments for are compelling. The arguments against really boil down to the same argument against East Const - inconsistency with older code. As Jon Kalb deliberated on in A Foolish Consistency, this sort of thinking can hold us back.

I've been favouring this style for more than a couple of years now. In fact I tracked down a post to the ACCU mailing list (linked here, but I believe you have to be a subscriber to read it) where I talked about it - and made all the same points I'm making here. My opinion since then has not changed much. Other than feeling more confident that it's The Right Thing.

So I think it's time we gave it a catchy name. Unlike East Const it already has a name, "trailing return types". It's not especially galvanising, though. Given the parallels to East vs West Const - and the fact that it, also, relates to the thing in question being placed to the left or the right, I propose East End Functions (vs West End Functions).

What about the redundant auto keyword?

Think of auto, here, as the "function introducer". In other languages it might be spelt fun or func. If it makes you feel better you could always:

#define func auto

... actually don't. The point is, in languages that introduce a function with func, then have a trailing return type, nobody gives it a second thought. auto is the same number of characters as func. It's a shame it's not quite as expressive - but that's the price of legacy. It shouldn't mean we "can't have nice things".

GDPR has a huge impact on empirical software engineering research

The EU’s General Data Protection Regulation (GDPR) is going to have a huge impact on empirical software engineering research. After 25 May 2018, analyzing source code will never be the same again.

I am not a lawyer and nothing qualifies me to talk about the GDPR.

People put their name in source code, bug tracking databases and discussion forums; this is personal identifying information.

Researchers use personal names to obtain information about a wide variety of activities, e.g., how much code did individuals write, how many bug reports did they process, contributions in discussions of one sort or another.

Open source licenses give others all kinds of rights (e.g., ability to use and modify source code), but they do not contain any provisions for processing personal data.

Adding a “I hereby give permission for anybody to process information about my name in any way they see fit.” clause to licenses is not going to help.

The GDPR requires (article 5: Principles relating to processing of personal data):

“Personal data shall be: … collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes;”

That is, personal data can only be processed for the specific reason it was collected, i.e., if you come up with another bright idea for analysis of data that has just been collected, it may be necessary to obtain consent, from those whose personal data it is, before trying out the bright idea.

It is not possible to obtain blanket permission (article 6, Lawfulness of processing):

“…the data subject has given consent to the processing of his or her personal data for one or more specific purposes;”, i.e., consent has to be obtained from the data subject for each specific purpose.

Github’s Global Privacy Practices shows that Github are intent on meeting the GDPR requirements, they include: “GitHub provides clear methods of unambiguous, informed consent at the time of data collection, when we do collect your personal data.”. Processing personal information, about an EU citizen, contained in source code appears to be a violation of Github’s terms of service.

The GDPR has many other requirements, e.g., right to obtain information on what information is held and right to be forgotten. But, the upfront killer is not being able to cheaply collect lots of code and then use personal information to help with the analysis.

There are exceptions for: Processing for archiving, scientific or historical research or statistical purposes. Can somebody who blogs and is writing a book claim to be doing scientific research? People who know more about these exceptions than me, tell me that there could be a fair amount of paperwork involved when making use of the exception, i.e., being able to show that privacy safeguards are in place.

Then, there is the issue of what constitutes personal information. Git’s hashing algorithm makes use of the committer’s name and/or email address. Is a git hash personal identifying information?

A good introduction to the GDPR for developers.

Can you get a deadlock with a single lock and an IO operation?

Quite a while ago, I answered a question about the basic deadlock scenario on Stack Overflow. More recently, I got an interesting comment on it. The poster asked if it was possible to get a deadlock with a single lock and an I/O operation. My first gut reaction was “no, not really”, but it got […]

The post Can you get a deadlock with a single lock and an IO operation? appeared first on The Lone C++ Coder's Blog.

Reliability chapter added to “Empirical software engineering using R”

The Reliability chapter of my Empirical software engineering book has been added to the draft pdf (download here).

I have been working on this draft for four months and it still needs lots of work; time to move on and let it stew for a while. Part of the problem is lack of public data; cost and schedule overruns can be rather public (projects chapter), but reliability problems are easier to keep quiet.

Originally there was a chapter covering reliability and another one covering faults. As time passed, these merged into one. The material kept evaporating in front of my eyes (around a third of the initial draft, collected over the years, was deleted); I have already written about why most fault prediction research is a waste of time. If it had not been for Rome I would not have had much to write about.

Perhaps what will jump out at people most, is that I distinguish between mistakes in code and what I call a fault experience. A fault_experience=mistake_in_code + particular_input. Most fault researchers have been completely ignoring half of what goes into every fault experience, the input profile (if the user does not notice a fault, I do not consider it experienced) . It’s incredibly difficult to figure out anything about the input profile, so it has been quietly ignored (one of the reasons why research papers on reported faults are such a waste of time).

I’m also missing an ‘interesting’ figure on the opening page of the chapter. Suggestions welcome.

I have not said much about source code characteristics. There is a chapter covering source code, perhaps some of this material will migrate to reliability.

All sorts of interesting bits and pieces have been added to earlier chapters. Ecosystems keeps growing and in years to come somebody will write a multi-volume tomb on software ecosystems.

I have been promised all sorts of data. Hopefully some of it will arrive.

As always, if you know of any interesting software engineering data, please tell me.

Source code chapter next.

No more pointers

One of the major changes at the most recent C++ standards meeting in Jacksonville was the decision to deprecate raw pointers in C++20, moving to remove them completely in C++23. This came as a surprise to many, with a lot of discussion as to how we’ll get by without this fundamental utility available any more. In this post I’ll look at how we can replace some of the main use-cases of raw pointers in C++20.

Three of the main reasons people use raw pointers are:

  • Dynamic allocation & runtime polymorphism
  • Nullable references
  • Avoiding copies

I’ll deal with these points in turn, but first, an answer to the main question people ask about this change.

The elephant in the room

What about legacy code? Don’t worry, the committee have come up with a way to boldly move the language forward without breaking all the millions of lines of C++ which people have written over the years: opt-in extensions.

If you want to opt-in to C++20’s no-pointers feature, you use #feature.

#feature <no_pointers> //opt-in to no pointers
#feature <cpp20>       //opt-in to all C++20 features

This is a really cool new direction for the language. Hopefully with this we can slowly remove features like std::initializer_list so that new code isn’t bogged down with legacy as much as it is today.

Dynamic allocation & runtime polymorphism

I’m sure most of you already know the answer to this one: smart pointers. If you need to dynamically allocate some resource, that resource’s lifetime should be managed by a smart pointer, such as std::unique_ptr or std::shared_ptr. These types are now special compiler-built-in types rather than normal standard library types. In fact, std::is_fundamental<std::unique_ptr<int>>::value now evaluates to true!

Nullable references

Since references cannot be rebound and cannot be null, pointers are often used to fulfil this purpose. However, with C++20, we have a new type to fulfil this purpose: std::optional<T&>. std::optional was first introduced in C++17, but was plagued with no support for references and no monadic interface. C++20 has fixed both of these, so now we have a much more usable std::optional type which can fill the gap that raw pointers have left behind.

Avoiding copies

Some people like to use raw pointers to avoid copies at interface boundaries, such as returning some resource from a function. Fortunately, we have much better options, such as (Named) Return Value Optimization. C++17 made some forms of copy elision mandatory, which gives us even more guarantees for the performance of our code.

Wrapping up

Of course there are more use-cases for raw pointers, but this covers three of the most common ones. Personally, I think this is a great direction to see the language going in, and I look forward to seeing other ways we can slowly make C++ into a simpler, better language.

Using tuned.conf to disable mongod startup warnings on RHEL/CentOS 7

RHEL 7 – and CentOS 7, which I used for this test – use tuned.conf to set a lot of system settings. Several of the tuned settings affect MongoDB’s performance; some are important enough that mongod actually triggers startup warnings. The main setting is transparent huge pages, which is a setting that does not work […]

The post Using tuned.conf to disable mongod startup warnings on RHEL/CentOS 7 appeared first on The Lone C++ Coder's Blog.

McCabe’s cyclomatic complexity and accounting fraud

The paper in which McCabe proposed what has become known as McCabe’s cyclomatic complexity did not contain any references to source code measurements, it was a pure ego and bluster paper.

Fast forward 10 years and cyclomatic complexity, complexity metric, McCabe’s complexity…permutations of the three words+metrics… has become one of the two major magical omens of code quality/safety/reliability (Halstead’s is the other).

It’s not hard to show that McCabe’s complexity is a rather weak measure of program complexity (it’s about as useful as counting lines of code).

Just as it is possible to reduce the number of lines of code in a function (by putting all the code on one line), it’s possible to restructure existing code to reduce the value of McCabe’s complexity (which is measured for individual functions).

The value of McCabe’s complexity for the following function is 16, i.e., there are 16 possible paths through the function:

int main(void)
{
if (W) a(); else b();
if (X) c(); else d();
if (Y) e(); else f();
if (Z) g(); else h();
}

each ifelse contains two paths and there are four in series, giving 2*2*2*2 paths.

Restructuring the code, as below, removes the multiplication of paths caused by the sequence of ifelse:

void a_b(void) {if (W) a(); else b();}

void c_d(void) {if (X) c(); else d();}

void e_f(void) {if (Y) e(); else f();}

void g_h(void) {if (Z) g(); else h();}

int main(void)
{
a_b();
c_d();
e_f();
g_h();
}

reducing main‘s McCabe complexity to 1 and the four new functions each have a McCabe complexity of two.

Where has the ‘missing’ complexity gone? It now ‘exists’ in the relationship between the functions, a relationship that is not included in the McCabe complexity calculation.

The number of paths that can be traversed, by a call to main, has not changed (but the McCabe method for counting them now produces a different answer)

Various recommended practice documents suggest McCabe’s complexity as one of the metrics to consider (but don’t suggest any upper limit), while others go as far as to claim that it’s bad practice for functions to have a McCabe’s complexity above some value (e.g., 10) or that “Cyclomatic complexity may be considered a broad measure of soundness and confidence for a program“.

Consultants in the code quality/safety/security business need something to complain about, that is not too hard or expensive for the client to fix.

If a consultant suggested that you reduced the number of lines in a function by joining existing lines, to bring the count under some recommended limit, would you take them seriously?

What about, if a consultant highlighted a function that had an allegedly high McCabe’s complexity? Should what they say be taken seriously, or are they essentially encouraging developers to commit the software equivalent of accounting fraud?

Visual Lint 6.5.1.294 has been released

Visual Lint 6.5.1.294 has just been released. This is a maintenance update for Visual Lint 6.5, and is compatible with all Visual Lint 6.0 and 6.5 licence keys. The following changes are included:
  • Built-in compiler preprocessor symbols are now automatically included in the analysis configuration for Atmel Studio projects using ARM toolchains where possible.
  • Fixed a bug which caused a "project changed" event to be erroneously sourced if an external project file located in the same folder as a loaded project was changed.
  • The PC-lint raw analysis results parser will now raise a fatal error if a PC-lint Plus License Error is detected.
  • Fixed a bug in the "Analysis Tool" Options page which affected browsing for an analysis tool installation folder.
  • Modified a handful of prompts to refer to "PC-lint or PC-lint Plus" rather than just "PC-lint".
Download Visual Lint 6.5.1.294

Top, must-read paper on software fault analysis

What is the top, must read, paper on software fault analysis?

Software Reliability: Repetitive Run Experimentation and Modeling by Phyllis Nagel and James Skrivan is my choice (it’s actually a report, rather than a paper). Not only is this report full of interesting ideas and data, but it has multiple replications. Replication of experiments in software engineering is very rare; this work was replicated by the original authors, plus Scholz, and then replicated by Janet Dunham and John Pierce, and then again by Dunham and Lauterbach!

I suspect that most readers have never heard of this work, or of Phyllis Nagel or James Skrivan (I hadn’t until I read the report). Being published is rarely enough for work to become well-known, the authors need to proactively advertise the work. Nagel, Dunham & co worked in industry and so did not have any students to promote their work and did not spend time on the academic seminar circuit. Given enough effort it’s possible for even minor work to become widely known.

The study run by Nagel and Skrivan first had three experienced developers independently implement the same specification. Each of these three implementations was then tested, multiple times. The iteration sequence was: 1) run program until fault experienced, 2) fix fault, 3) if less than five faults experienced, goto step (1). The measurements recorded were fault identity and the number of inputs processed before the fault was experienced.

This process was repeated 50 times, always starting with the original (uncorrected) implementation; the replications varied this, along with the number of inputs used.

For a fault to be experienced, there has to be a mistake in the code and the ‘right’ input values have to be processed.

How many input values need to be processed, on average, before a particular fault is experienced? Does the average number of inputs values needed for a fault experience vary between faults, and if so by how much?

The plot below (code+data) shows the numbers of inputs processed, by one of the implementations, before individual faults were experienced, over 50 runs (sorted by number of inputs):

Number of inputs processed before particular fault experienced

Different faults have different probabilities of being experienced, with fault a being experienced on almost any input and fault e occurring much less frequently (a pattern seen in the replications). There is an order of magnitude variation in the number of inputs processed before particular faults are experienced (this pattern is seen in the replications).

Faults were fixed as soon as they were experienced, so the technique for estimating the total number of distinct faults, discussed in a previous post, cannot be used.

A plot of number of faults found against number of inputs processed is another possibility. More on that another time.

Suggestions for top, must read, paper on software faults, welcome (be warned, I think that most published fault research is a waste of time).

Digg Reader shuts down, and thoughts on organising my blog reading

Farewell, Digg Reader Unfortunately,  Digg announced that Digg Reader is shutting down tomorrow. While I never used Digg Reader as my main RSS feed reader – I’ve got a paid subscription to Feedly – I was very happy to use it as a backup reader for those feeds that weren’t always that great at adhering […]

The post Digg Reader shuts down, and thoughts on organising my blog reading appeared first on The Lone C++ Coder's Blog.

Product Owner or Backlog Administrator?

3337233_thumbnail-2018-03-20-18-08.jpg

In the official guides all Product Owners are equal. One size fits all.

In the world I live in some Product Owners are more equal than others and one size does not fit all.

The key variable here is the amount of Authority a Product Owner has. In my last post I said that Authority is one of the four things every product owner needs – the others being legitimacy, skills and time. However there is a class of Product Owner who largely lack authority and who I have taken to calling Backlog Administrators.

About the only thing a Backlog Administrator owns is their Jira login. They are at the beck and call of one or more people who tell them what should be in the backlog. Prioritisation is little more than an exercise in decibel management – he who shouts loudest gets what they want.

A Backlog Administrator rarely throws anything out of the backlog, they don’t feel they have the authority to do so. As a result their backlogs are constipated – lots of stories, many of little value. Fortunately Jira knows no limits, it is a bottomless pit – just don’t draw a CfD or Burn-Up chart!

If the team are lucky the Backlog Administrator can operate as a Tester, they can review work which is in progress or possibly “done.” They may be able to add acceptance criteria. If the team are unlucky the Backlog Administrator doesn’t know enough about the domain to do testing.

I would be the first to say that the Product Owner role can be vary a great deal: different individuals working with different teams in different domains for different types of company mean there that apart from backlog administration there is inherently a lot of variability in the role.

The Product Owner role should be capable of deciding what to build and/or change.

So Product Owners need to know what the most valuable thing to do is. Part of the job means finding out what is valuable. While Backlog Administration is part of the job the question one should ask is:

How does the Product Owner know what they need to know to do that?

Backlog Administrators are little more than gophers for more senior people.

True Product Owners take after full Product Managers and Senior Business Analysts – or a special version of Business Analysts sometimes called Business Partners.

Product Owners should be out meeting customers and observing users. They should be talking about technology options with the technical team and interface design options with UXD.

Product Owners should understand commercial pressures, how the product makes (or saves) money for the company. Product Owners are responsible for Product Strategy so they should both understand company strategy and input into company strategy. Product Strategy both supports company strategy and feeds into company strategy.

Product Owners may need to observe the competitor landscape and keep an eye on competitors and understand relevant technology trends. That probably means attending trade shows and even supporting sales people if asked.

Frequently Product Owners will require knowledge of the domain, i.e. the field in which your product is used. Sometimes – like in telecoms or surveying that may require actual hands on experience.

And apart from backlog administration there is a lot of work to do to deliver the things they want delivered: they need to work with the technical team to explain stories, to have the conversations behind the story, write acceptance criteria, attend planning meetings, perhaps help with interviewing new staff and sharing all the things they learn from meeting customers, analysing competitors, debating strategy, attending shows, etc. etc.

I sure there are many who would rush to call the Backlog Administrator an “anti-pattern” but since I don’t believe in anti-patterns I don’t. I just think Product Owners should be more than a Backlog Administrator.

The post Product Owner or Backlog Administrator? appeared first on Allan Kelly Associates.

Estimating the number of distinct faults in a program

In an earlier post I gave two reasons why most fault prediction research is a waste of time: 1) it ignores the usage (e.g., more heavily used software is likely to have more reported faults than rarely used software), and 2) the data in public bug repositories contains lots of noise (i.e., lots of cleaning needs to be done before any reliable analysis can done).

Around a year ago I found out about a third reason why most estimates of number of faults remaining are nonsense; not enough signal in the data. Date/time of first discovery of a distinct fault does not contain enough information to distinguish between possible exponential order models (technical details; practically all models are derived from the exponential family of probability distributions); controlling for usage and cleaning the data is not enough. Having spent a lot of time, over the years, collecting exactly this kind of information, I was very annoyed.

The information required, to have any chance of making a reliable prediction about the likely total number of distinct faults, is a count of all fault experiences, i.e., multiple instances of the same fault need to be recorded.

The correct techniques to use are based on work that dates back to Turing’s work breaking the Enigma codes; people have probably heard of Good-Turing smoothing, but the slightly later work of Good and Toulmin is applicable here. The person whose name appears on nearly all the major (and many minor) papers on population estimation theory (in ecology) is Anne Chao.

The Chao1 model (as it is generally known) is based on a count of the number of distinct faults that occur once and twice (the Chao2 model applies when presence/absence information is available from independent sites, e.g., individuals reporting problems during a code review). The estimated lower bound on the number of distinct items in a closed population is:

S_{est} ge S_{obs}+{n-1}/{n}{f^2_1}/{2f_2}

and its standard deviation is:

S_{sd-est}=sqrt{f_2 [0.25k^2 ({f_1}/{f_2} )^4+k^2 ({f_1}/{f_2} )^3+0.5k ({f_1}/{f_2} )^2 ]}

where: S_{est} is the estimated number of distinct faults, S_{obs} the observed number of distinct faults, n the total number of faults, f_1 the number of distinct faults that occurred once, f_2 the number of distinct faults that occurred twice, k={n-1}/{n}.

A later improved model, known as iChoa1, includes counts of distinct faults occurring three and four times.

Where can clean fault experience data, where the number of inputs have been controlled, be obtained? Fuzzing has become very popular during the last few years and many of the people doing this work have kept detailed data that is sometimes available for download (other times an email is required).

Kaminsky, Cecchetti and Eddington ran a very interesting fuzzing study, where they fuzzed three versions of Microsoft Office (plus various Open Source tools) and made their data available.

The faults of interest in this study were those that caused the program to crash. The plot below (code+data) shows the expected growth in the number of previously unseen faults in Microsoft Office 2003, 2007 and 2010, along with 95% confidence intervals; the x-axis is the number of faults experienced, the y-axis the number of distinct faults.

Predicted growth of unique faults experienced in Microsoft Office

The take-away point: if you are analyzing reported faults, the information needed to build models is contained in the number of times each distinct fault occurred.

April nor(DEV): A.I. and Cognitive Computing with Watson & Keep Secure and Under the Radar

What:  A.I. and Cognitive Computing with Watson & Keep Secure and Under the Radar

When: Wednesday 4th April, 6.30pm to 9pm.

Where: Whitespace, 2nd Floor, St James' Mill, Whitefriars, NR3 1TN

RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/242231165/

A.I. and Cognitive Computing with Watson
Colin Mower

Artificial Intelligence and Cognitive Computing have become the latest buzzwords in the industry, with companies big and small rushing to work out how they can take advantage of this emerging technology.

In this discussion, we’ll look at the myths behind the hype, how mature the technology is and how IBM’s Watson has evolved from game show winner to one of the market leaders.

Colin works for IBM as a Technical Leader, crossing all the IBM technologies and services. Prior to Big Blue, he worked in Aviva for over 14 years and has contributed to nor(DEV):con and Norfolk Developer Meetups.

He still lives in Norfolk and apart from plenty of travel working for some of the big blue chip companies, he tries to get out in South Norfolk running and cycling in a vain attempt to lose weight and keep fit.


Keep Secure and Under the Radar
David Higgins

Some basic and some not so basic steps to keep you and your business safe in the on-line business arena.

David is ex UK Gov contractor discusses simple steps you need to take to stay ahead of current data security legislation and keep yourself / your business secured.

On Natural Analogarithms – student

Last year my fellow students and I spent a goodly portion of our free time considering the similarities of the relationships between sequences and series and those between derivatives and integrals. During the course of our investigations we deduced a sequence form of the exponential function ex, which stands alone in satisfying the equations

    D f = f
  f(0) = 1

where D is the differential operator, producing the derivative of the function to which it is applied.
This set us to wondering whether or not we might endeavour to find a discrete analogue of its inverse, the natural logarithm ln x, albeit in the sense of being expressed in terms of integers rather than being defined by equations involving sequences and series.

Linux & SQL Server at MigSolv a Review

We love the MigSolv data centre out at Bowthorpe in Norwich. This was nor(DEV):’s second visit and they always make us very welcome. Walking into what feels like a massive Blakes 7 set and getting the tour,including the retina scanner and massive server hall, is incredible and seriously interesting (even though it’s my third time!).

The intimacy of the board room with the table down the centre and nor(DEV): members arranged each side is great for generating conversation! And when you have a humorous and huge personality like Mark Pryce-Maher it encourages the banter and the discussion even more! It’s safe to say this was one of the most interactive nor(DEV): evening presentations for some time.

Mark was there to tell us about how you can run Microsoft SQL Server on Linux (or is that “Lynux”?). Anyone would think Mark had been on the WINE, but no, you really can run SQL Server natively on Linux now. The first question though, has to be “why?”. The answer is simple. Microsoft are going after geeks, Oracle users and Linux houses who only run Windows to run SQL server.

The second question is “how?”. Developers at Microsoft discovered that, despite the vast number of methods available from the Win32 API, there are only a small number of methods which actually talk to the operating system. These are for allocating memory, disc storage, etc. A project called Drawbridge was developed to identify these methods and port them to Linux. SQL Server can then make use of those methods to run on Linux. Simples!

Mark did a live demo of installing and connecting to SQL Server. Unfortunately he hadn’t made sufficient sacrifices to the demo gods and things didn’t go precisely to plan. SQL Server can be run on an Ubuntu instance on Microsoft’s Azure from about £1/day (I’m intending to try it on a Digital Ocean droplet which is slightly cheaper). It’s incredibly easy to install. You just add the necessary repositories to Ubuntu’s package manager and tell it to install SQL Server. There’s also a pre-made Docker image (if Docker is your thing) which is even quicker.

Microsoft have developed an open source version of the client tools called Microsoft Operations Studio . It is also very easy to install (I did it on my Linux Mint laptop over 4G while Mark was speaking), but for some reason during the demo it just wouldn’t connect to SQL Server. However, Mark talks a great talk and I’m sure with a little bit more playing it would have!

We enjoyed being at MigSolv and hearing from Mark! MigSolv would like us to go back and we’re keen to do so in the future.

The Next nor(DEV): is on 4th April and features “A.I. and Cognitive Computing with Watson” from Colin Mower of Microsoft and “Keep Secure and Under the Radar” from David Higgins. RSVP here: https://www.meetup.com/Norfolk-Developers-NorDev/events/242231165/

emBO++ 2018 Trip Report

emBO++ is a conference focused on C++ on embedded systems in Bochum, Germany. This was it’s second year of operation, but the first that I’ve been along to. It was a great conference, so I’m writing a short report to hopefully convince more of you to attend next year!

Format

The conference took place over four days: an evening of lightning talks and burgers, a workshop day, a day of talks (what you might call the conference proper), and finally an unofficial standards meeting for those interested in SG14. This made for a lot of variety, and each day was valuable.

Venue

One thing I really enjoyed about emBO++ was that the different tech and social events were dotted around the city. This meant that I actually got to see some of Bochum, get lost navigating the train system, walk around town at night, etc., which made a nice change from being cooped up in a hotel for a few days.

The main conference venue was at the Zentrum für IT-Sicherheit (Centre for IT Security). It was a spacious building with a lot of light and large social areas, so it suited the conference environment well. The only problem was that it used to be a military building and was lined with copper, making the thing into one huge Faraday cage. This meant that WiFi was an issue for the first few hours of the day, but it got sorted eventually.

zits

Food and Drink

The catering at the main conference location was really excellent: a variety of tasty food with healthy options and large quantities. Even better were the selection of drinks available, which mostly consisted of interesting soft drinks which I’d never seen before, such as bottled Matcha with lime and a number of varieties of Mate. All the locations we went to for food and drinks were great – especially the speakers dinner. A lot of thought was obviously put into this area of the conference, and it showed.

Workshops

There were four workshops on the first day of the conference with two running in parallel. The two which I attended were very interesting and instructive, but I wish that they had been more hands-on.

Jörn Seger – Getting Started with Yocto

I was in two minds about attending this workshop. We need to use Yocto a little bit in my current project, so I could attend the workshop in order to gain more knowledge about it. On the other hand, I’d then be the most experienced member of my team in Yocto and would be forced to fix all the issues!

In the end I decided to go along, and it was definitely worthwhile. Previously I’d mostly muddled along without an understanding of the fundamentals of the system; this workshop provided those.

Kris Jusiak – Embedding a Compile-Time-State-Machine

Kris gave a workshop on Boost.SML, which is an embedded domain specific language (EDSL) for encoding expressive, high-performance state machines in C++. The library is very impressive, and it was valuable to see all the different use-cases it supports and how it supports switching out the frontend and backend of the system. I was particularly interested in this session as my talk the next day was on EDSLs, so it was an opportunity to steal some things to mention in my talk.

You can find Boost.SML here.

Talks

There were two tracks for most of the day, with the first and final ones being plenary sessions. There was a strong variety of talks, and I felt that my time was well-spent at all of them.

Simon Brand – Embedded DSLs for Embedded Programming

My talk! I think it went down well. I got some good engagement and questions from the audience, although not much feedback from the attendees later on in the day. I guess I’ll need to wait for it to get torn apart on YouTube.

me

Klemens Morgenstern – Developing high-performance Coroutines for ARMs

Klemens gave an excellent talk about an ARM coroutine library which he implemented. This talk has nothing to do with the C++ Coroutines TS, instead focusing on how coroutines can be implemented in a very low-overhead manner. In Klemens’ library, the user provides some memory to be used as the stack for the coroutine, then there are small pieces of ARM assembly which perform the context switch when you suspend or resume that coroutine. The talk went into the performance considerations, implementation, and use in just the right amount of detail, so I would definitely recommend watching if you want an overview of the ideas.

The library and presentation can be found here.

Emil Fresk – The compile-time, reactive scheduler: CRECT

CRECT is a task scheduler which carries out its work at compile time, therefore almost entirely disappearing from the generated assembly. Emil’s lofty goal for the talk was to present all of the necessary concepts such that those viewing the talk would feel like they could go off and implement something similar afterwards. I think he mostly succeeded in this – although a fair amount of metaprogramming skills would be required! He showed how to use the library to specify the jobs which need to be executed, the resources which they require, and when they should be scheduled to run. After we understood the fundamentals of the library, we learned how this actually gets executed at compile-time in order to produce the final scheduled output. Highly recommended for those who work with embedded systems and want a better way of scheduling their work.

You can find CRECT here.

Ben Craig – Standardizing an OS-less subset of C++

If you watch one talk from the conference it should be this one. C++ has had a “freestanding” variant for a long time, and it’s been neglected for the same amount of time. Ben talked about all the things which should not be available in freestanding mode but are, and those which should be but are not. He presented his vision for what should be standards-mandated facilities available in freestanding C++ implementations, and a tentative path to making this a reality. Particularly of interest were the odd edge cases which I hadn’t considered. For example, it turns out that std::array has to #include <string> somewhere down the line, because my_array.at(n) can throw an exception (std::out_of_range), and that exception has a constructor which takes std::string as an argument. These tiny issues will make getting a solid standard for freestanding difficult to pin down and agree on, but I think it’s a worthy cause.

Ben’s ISO C++ paper on a freestanding standard library can be found here.

Jacek Galowicz — Scalable test infrastructure for advanced bare-metal software development

In Jacek’s team, they have many different hardware versions to support. This creates a problem of creating regressions in some versions and not others when changes are made. This talk showed how they developed the testing infrastructure to enable them to test all hardware versions they needed on each merge request to ensure that bad commits aren’t merged in to the master branch. They wrote a simple testing framework in Haskell which was fine-tuned to their use case rather than using an existing solution like Jenkins (that’s what we use at Codeplay for solving the same problem). Jacek spoke about issues they faced and the creative solutions they put in place, such as putting a light detector over the CAPS LOCK button of a keyboard and making it blink in Morse code in order to communicate out from machines with no usable ports.

Odin Holmes – Bare-Metal-Roadmap

Odin’s talk summed up some current major problems that are facing the embedded community, and roped together all of the talks which had come before. It was cool to see the overlap in all the talks in areas of abstraction, EDSLs, making choices at compile time, etc.

Closing

I had a great time at emBO++ and would whole-heartedly recommend attending next year. The talks should be online in the next few months, so I look forward to watching those which I didn’t attend. The conference is mostly directed at embedded C++ developers, but I think it would be valuable to anyone doing low-latency programming on non-embedded systems, or those writing C/Rust/whatever for embedded platforms.

Thank you to Marie, Odin, Paul, Stephan, and Tabea for inviting me to talk and organising these great few days!

embo

ACME DNS Validation

I was looking at modifying acme tiny to support DNS-01 validation with a custom PowerDNS backend just a few days ago (in my case to get certificates for an XMPP server where there isn't a corresponding HTTP server or the HTTP server is hosted on a different machine). This work is available from Subversion: pdns-acme-backend.

Interestingly, I am just reading that Let's Encrypt is now supporting wildcard certificates that need to be validated using the DNS-01 challenge type.

Historians of computing

Who are the historians of the computing? The requirement I used for deciding who qualifies (for this post), is that the person has written multiple papers on the subject over a period that is much longer than their PhD thesis (several people have written history of some aspect of computing PhDs and then gone on to research other areas).

Maarten Bullynck. An academic who is a historian of mathematics and has become interested in software; use HAL to find his papers, e.g., What is an Operating System? A historical investigation (1954–1964).

Martin Campbell-Kelly. An academic who has spent his research career investigating computing history, primarily with a software orientation. Has written extensively on a wide variety of software topics. His book “From Airline Reservations to Sonic the Hedgehog: A History of the Software Industry” is still on my pile of books waiting to be read (but other historian cite it extensively). His thesis: “Foundations of computer programming in Britain, 1945-55″, can be freely downloadable from the British Library; registration required.

James W. Cortada. Ex-IBM (1974-2012) and now working at the Charles Babbage Institute. Written extensively on the history of computing. More of a hardware than software orientation. Written lots of detail oriented books and must have pole position for most extensive collection of material to cite (his end notes are very extensive). His “Buy The Digital Flood: The Diffusion of Information Technology Across the U.S., Europe, and Asia” is likely to be the definitive work on the subject for some time to come. For me this book is spoiled by the author towing the company line in his analysis of the IBM antitrust trial; my analysis of the work Cortada cites reaches the opposite conclusion.

Nathan Ensmenger. An academic; more of a people person than hardware/software. His paper Letting the Computer Boys Take Over contains many interesting insights. His book The Computer Boys Take Over Computers, Programmers, and the Politics of Technical Expertise is a combination of topics that have been figured and back with references and topics still being figured out (I wish he would not cite Datamation, a trade mag back in the day, so often).

Michael S. Mahoney. An academic who is sadly no longer with us. A historian of mathematics before becoming involved with primarily software.

Jeffrey R. Yost. An academic. I have only read his book “Making IT Work: A history of the computer services industry”, which was really a collection of vignettes about people, companies and events; needs some analysis. Must try to track down some of his papers (which are not available via his web page :-(.

Who have I missed? This list is derived from papers/books I have encountered while working on a book, not an active search for historians. Suggestions welcome.

National Apprenticeship Week

Seeing as it’s been National Apprenticeship Week this week, we thought we would shine a light on our apprentices, past and present. Naked Element would be a duller place without them and the valuable work they do!

We’ve had three apprentices in total, Lewis, Rain and Jack and they’ve all been invaluable to our business. Lewis spent his year-long software development apprenticeship with us, before staying on a while longer as a full-time employee. He headed User Story workshops, held meetings with clients and even managed to join in with some of the social sides of Naked Element too! Lewis got a lot out of his time with us, saying "an apprenticeship is a great way to get your foot in the door of an industry, gain some excellent skills and first-hand experience in a job you may want to turn into a career". Lewis decided to be an apprentice because he felt that a more hands-on approach to learning would suit him better than studying full time. At the time he hoped he would be working in the US in the near future, but he has since decided to settle down at university and is due to begin a Computer Science degree at the UEA later this year to bolster his industry experience with a formal qualification.

Rain joined us as an administrative apprentice for just over a year, keeping us organised and the company running smoothly. Rain was an asset to Naked Element, as a natural networker and often the first face to greet clients, she helped start the conversation about software and business. From the professional presentation in her initial interview to managing conferences, she impressed us all. She took her experience with Naked Element and became the executive PA to the CEO of Apple Helicopters!

Our current apprentice is Jack, who is part-way through his software apprenticeship. We’ve been so impressed with Jack that we’re hoping he will stay on after his course has finished to be a software developer full time! He’s a good problem solver, helping Naked Element deliver projects more cost effectively and equally enthusiastic at tech events when he represents the company.

Our CEO Paul says "I believe that apprentices are an excellent way for the predominantly small tech companies in the TechEast region to grow and a way to help fill the skills gap we have here. They are also a great way to support young people in our region to get industry experience." Naked Element has found all three apprentices invaluable to supporting and growing our business and we’re very proud of how far they’ve come!

Visual Lint 6.5 has been released

The first public build of Visual Lint 6.5 has just been uploaded to our website. Visual Lint 6.5 is the second Visual Lint 6.x release, superseding Visual Lint 6.0. As a minor update, it will also accept existing per-user Visual Lint 6.0 licences; Visual Lint 1.x, 2.x, 3.x, 4.x and 5.x per-user licences must however be upgraded to work with this version. Full details of the changes in this version are as follows: Host Environments:
  • Removed the (deprecated since Visual Lint 5.0) ability of the Visual Studio plug-in to load within Microsoft Visual Studio 6.0 and eMbedded Visual C++ 4.0. Projects for these environments can of course still be analysed in the standalone VisualLintGui and VisualLintConsole applications.
Analysis Tools:
  • Modifications to support PC-lint Plus PCH analysis, which creates object files (.lpph or .lpch) in the project working folder rather than (as was the case with PC-lint 9.0) in the folder containing the PCH header file. This should affect only projects where the PCH header file is contained in a different folder from the project file.
  • PC-lint project indirect (project.lnt) files are now automatically recreated if a different version of the analysis tool is in use.
Installation:
  • The installer now prompts for affected applications (Visual Studio, Atmel Studio, AVR Studio, Eclipse, VisualLintConsole and VisualLintGui) to be closed before installation can proceed.
  • The installer now installs VSIX extensions to Visual Studio 2017 and Atmel [AVR] Studio silently.
  • Revised the order of registration of the Visual Studio plug-in with each version of Visual Studio so that the newest versions are now registered first.
  • Uninstallation no longer incorrectly runs "Configuring Visual Studio..." steps if the VS plug-in is not selected for installation.
  • The "Installing Visual Lint" progress bar is now updated while Visual Studio, Atmel Studio and Eclipse installations are being registered.
  • Improved the logging of VSIX extension installation/uninstallation.
User Interface:
  • The Analysis Status View now supports text filters of the form "Project/File".
  • Added a new Window List Dialog to VisualLintGui to display details of the open MDI child windows, and allow selected windows to be activated, saved or closed as a group.
  • Widened the About Box slightly.
Reports:
  • Replaced the table sort code in generated HTML reports with a simpler, more robust implementation from https://www.kryogenix.org/code/browser/sorttable/.
  • Replaced the Teechart generated Issue Count by Category/ID charts in HTML reports with Javascript ones.
Bug Fixes: Download Visual Lint 6.5.0.293

Building a regression model is easy and informative

Running an experiment is very time-consuming. I am always surprised that people put so much effort into gathering the data and then spend so little effort analyzing it.

The Computer Language Benchmarks Game looks like a fun benchmark; it compares the performance of 27 languages using various toy benchmarks (they could not be said to be representative of real programs). And, yes, lots of boxplots and tables of numbers; great eye-candy, but what do they all mean?

The authors, like good experimentalists, make all their data available. So, what analysis should they have done?

A regression model is the obvious choice and the following three lines of R (four lines if you could the blank line) build one, providing lots of interesting performance information:

cl=read.csv("Computer-Language_u64q.csv.bz2", as.is=TRUE)

cl_mod=glm(log(cpu.s.) ~ name+lang, data=cl)
summary(cl_mod)

The following is a cut down version of the output from the call to summary, which summarizes the model built by the call to glm.

                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         1.299246   0.176825   7.348 2.28e-13 ***
namechameneosredux  0.499162   0.149960   3.329 0.000878 ***
namefannkuchredux   1.407449   0.111391  12.635  < 2e-16 ***
namefasta           0.002456   0.106468   0.023 0.981595    
namemeteor         -2.083929   0.150525 -13.844  < 2e-16 ***

langclojure         1.209892   0.208456   5.804 6.79e-09 ***
langcsharpcore      0.524843   0.185627   2.827 0.004708 ** 
langdart            1.039288   0.248837   4.177 3.00e-05 ***
langgcc            -0.297268   0.187818  -1.583 0.113531 
langocaml          -0.892398   0.232203  -3.843 0.000123 *** 
  
    Null deviance: 29610  on 6283  degrees of freedom
Residual deviance: 22120  on 6238  degrees of freedom

What do all these numbers mean?

We start with glm's first argument, which is a specification of the regression model we are trying to fit: log(cpu.s.) ~ name+lang

cpu.s. is cpu time, name is the name of the program and lang is the language. I found these by looking at the column names in the data file. There are other columns in the data, but I am running in quick & simple mode. As a first stab, I though cpu time would depend on the program and language. Why take the log of the cpu time? Well, the model fitted using cpu time was very poor; the values range over several orders of magnitude and logarithms are a way of compressing this range (and the fitted model was much better).

The model fitted is:

cpu.s. = e^{Intercept+name+prog}, or cpu.s. = e^{Intercept}*e^{name}*e^{prog}

Plugging in some numbers, to predict the cpu time used by say the program chameneosredux written in the language clojure, we get: cpu.s. = e^{1.3}*e^{0.5}*e^{1.2}=20.1 (values taken from the first column of numbers above).

This model assumes there is no interaction between program and language. In practice some languages might perform better/worse on some programs. Changing the first argument of glm to: log(cpu.s.) ~ name*lang, adds an interaction term, which does produce a better fitting model (but it's too complicated for a short blog post; another option is to build a mixed-model by using lmer from the lme4 package).

We can compare the relative cpu time used by different languages. The multiplication factor for clojure is e^{1.2}=3.3, while for ocaml it is e^{-0.9}=0.4. So clojure consumes 8.2 times as much cpu time as ocaml.

How accurate are these values, from the fitted regression model?

The second column of numbers in the summary output lists the estimated standard deviation of the values in the first column. So the clojure value is actually e^{1.2 pm (0.2*1.96)}, i.e., between 2.2 and 4.9 (the multiplication by 1.96 is used to give a 95% confidence interval); the ocaml values are e^{-0.9 pm (0.2*1.96)}, between 0.3 and 0.6.

The fourth column of numbers is the p-value for the fitted parameter. A value of lower than 0.05 is a common criteria, so there are question marks over the fit for the program fasta and language gcc. In fact many of the compiled languages have high p-values, perhaps they ran so fast that a large percentage of start-up/close-down time got included in their numbers. Something for the people running the benchmark to investigate.

Isn't it easy to get interesting numbers by building a regression model? It took me 10 minutes, ok I spend a lot of time fitting models. After spending many hours/days gathering data, spending a little more time learning to build simple regression models is well worth the effort.

Product Owners need 4 things

iStock_000008515543Small-2018-03-5-16-09.jpg

To be an effective Product Owner – and that includes product managers and business analysts who are nominating work for teams to do – you need at least four things. You may well need more than these four but these are common across all teams and domains.

  1. 1. Skills and experience

There is more to being a Product Owner than simply writing user stories and prioritising a backlog. Yes, you need to know how to work with a development team and how to work in an Agile-style process. Yes you need to be able to write user stories and acceptance criteria, perhaps BDD style cucumbers too; yes you need to be able to manage a backlog and prioritises and partake in planning meetings.

But how do you know what should be a priority?
How do you know what will deliver value? And please customers? Satisfy stakeholders?

Importantly Product Owners need to be able to do the work behind the backlog.

Product Owners need to meet people, have the conversations, do the analysis and thinking behind those things. Any idiot can pick random items form a backlog but it takes skills and experience to maximise value.

Product Owners need to be able to identify users, segment customers, interview people, understand their needs and jobs to be done. They need to know when to run experiments and when to turn to research journals and market studies. And that might mean they need data analysis skills too.

If the product is going to sell as a commercial product you will need wider product management skills. While if your product is for internal use you need more business analysis skills. And product managers will benefit from knowing about business analysts and business analysts will benefit from knowing about product management.

You may also need specialist domain knowledge – you might need to be a subject matter expert in your own right, or you might become an SME in given time.

Some understanding of business strategy, finance, marketing, process analysis and design, user experience design and more.

Don’t underestimate the skills and experience you need to be an effective Product Owner.

  1. 2. Authority

At the very least a Product Owner needs the authority to nominate the work the team are going to do for the next two weeks. They need the authority to choose items form a backlog and ask the team to do them. They need the authority not to have their decisions overridden on a regular basis. (OK, it happens occasionally.)

As a general rule the more authority the Product Owner has the more effective they are going to be in their role.

The organization may confer that authority but the team need to recognise and accept it too.

I’ve seen many Product Owners who while they have the authority to nominate work for a team don’t have the authority to throw things out of the backlog. When the only way for a story to leave the backlog is for it to be developed it is very expensive. This leads to constipated backlogs that are stuffed full of worthless rubbish and where one can’t see the wood for the trees.

If the Product Owner doesn’t have sufficient authority then either they need to borrow some or there is going to be trouble.

  1. 3. Legitimacy

Legitimacy is different from authority. Legitimacy is about being seen as the right person, the bonafide person to exercise authority and do the background work to find out what they need to find out in order to make those decisions.

Legitimacy means the Product Owner can go and meet customers if they want. And it means that they will get their expenses paid.

Legitimacy means that nobody else is trying to fill the Product Owner role or undermine them. In particular it means the team respect the Product Owner and trust them to make the right calls. Most of all they accept that once in a while – hopefully not too often – the Product Owner will have to say “I accept technologically X is the right thing but commercially it must be Y; full ahead and damn the torpedoes.”

It can be hard for a Product Owner to fill their role if the team believe a senior developer – or anyone else – should be managing the backlog and prioritising work to do.

  1. 4. Time

Finally, and probably the most difficult… Product Owners need time to do their work.

They need time to meet customers and reflect on those encounters.

They need time to work-the-backlog, value stories, weed out expired or valueless stories, think about the product vision, talk to stakeholders and more senior people, and then ponder what happens next.

Time to evaluate what has been delivered and see if it is delivering the expected value. Time to understand whether that which has been delivered is generating more or less value than expected. Time to feedback those findings into future work: to recalibrate expected values and priorities, generate more work or invalidate other work.

Product Owners need time to look at competitor products and consider alternatives – if only to steal ideas!

They need time to work with the technical team: have conversations about stories, expand on acceptance criteria, review work in progress, perhaps test completed features and socialise with the team.

They also need time to enhance their own skills and learn more about the domain.

And if they don’t have the time to do this?

Without time they will rush into planning meetings and say “I’ve been so busy, I haven’t looked at the backlog this week, just bear with me while I choose some stories…”

More often than not they will wing-it, they substitute opinion and guesswork instead of solid analysis, facts and data. They overlook competition and fail to listen to the team and other managers.

And O yes, they need time for their own lives and family.

I sometimes think that only Super Human’s need apply for a Product Owner role, or perhaps many Product Owners are set up to fail from day-1. Yet the role is so important.

I plan to explore this topic some more in the next few posts.

The post Product Owners need 4 things appeared first on Allan Kelly Associates.

A Decent Borel Code – a.k.

A few posts ago we took a look at how we might implement various operations on sets represented as sorted arrays, such as the union, being the set of every element that is in either of two sets, and the intersection, being the set of every element that is in both of them, which we implemented with ak.setUnion and ak.setIntersection respectively.
Such arrays are necessarily both finite and discrete and so cannot represent continuous subsets of the real numbers such as intervals, which contain every real number within a given range. Of particular interest are unions of countable sets of intervals Ii, known as Borel sets, and so it's worth adding a type to the ak library to represent them.

Statement sequence length for error/non-error paths

One of the folk truisms of the compiler/source code analysis business is that error paths are short, i.e., when an error situation is detected (such as failing to open a file), few statements are executed before the functions returns.

Having repeated this truism for many decades, figure 2 from the paper APEx: Automated Inference of Error Specifications for C APIs jumped off the page at me; thanks to Yuan Kang, I now have a copy of the data.

The plots below (code+data) show two representations of the non-error/error path lengths (measured in statements within individual functions of libc; counting starts at a library call that could return an error value). The upper plot shows statement sequence lengths for error/non-error paths, and the lower is a kernel density plot of the error/non-error sequence lengths.

Statements contains in error and non-error paths

Another truism is that people tend to write positive tests, i.e., tests that do not involve error handling (some evidence).

Code coverage measurements (e.g., number of statements or branches that are executed by a test suite) often show the pattern seen in the plot below (code+data; thanks to the authors of the paper Code Coverage for Suite Evaluation by Developers for making the data available). The data was obtained by measuring the coverage of 1,043 Java programs executing their associated test suite (circles denote program size). Lines are fitted regression models for different sized programs.

Statement coverage against decision coverage

If people are preferentially writing positive tests, test suites with low coverage would be expected to execute a greater percentage of statements than branches (an if-statement has two branches, taken/not-taken), i.e., the behavior seen in the plot above (grey line shows equal statement/branch coverage). Once the low hanging fruit is tested (i.e., the longer, non-error, cases), tests have to be written for the shorter, more likely to be error handling, cases.

The plot would also be explained by typical execution paths favoring longer basic blocks, but I don’t have any data that could show this one way or another.