Reinforcement learning

Feb 05, 2018
What can AlphaGo-Zero’s reinforcement learning process teach teams about their actions?

Towards the end of 2017, we were introduced to AlphaGo-Zero, arguably the strongest player (albeit a virtual one) in the history of the ancient Chinese game of Go. Previous versions of AlphaGo learned to play by absorbing details of thousands of human amateur and professional games, but AlphaGo Zero learned to play simply by playing games against itself, starting from completely random play. It learned to play a game in which there are more theoretical moves available to players than atoms in the known universe by learning solely from itself. After 40 days of self-learning, AlphaGo-Zero defeated the previous human-champion-defeating version of AlphaGo by 100 games to 0. And on day 21, it reached the standard of the top human players of all time by achieving ‘grand master’ status.

In artificial intelligence (AI) terms, this is called ‘reinforcement learning’ – a learning process in which “the learning agent has to decide how to act to perform its task. In the (intentional) absence of existing training data, the agent learns from experience. It collects the training examples (this action was good, that action was bad) through trial and error as it attempts its task, with the goal of maximising long-term reward”, according to Vishal Maini, a machine-learning expert.

Creating its own effectiveness

One of the big lessons we have learned from Simon Sinek, author of Start With Why and Leaders Eat Last, is that, when we are making a presentation, delivering a training intervention or engaging in any activity where we need the buy-in of others, we need to start with why. Indeed, the idea of AlphaGo-Zero ‘creating its own effectiveness’ provides a compelling ‘why’ when discussing team coaching programmes and interventions.

Why do we coach teams? Because it puts learning from experience, with the goal of maximising long-term performance levels, right at the centre of the team’s agenda. It challenges the team to decide how to act in order to best achieve its commissioned purpose. It empowers the team to create its own effectiveness.

As Peter Senge points out in his excellent book, The Fifth Discipline, “we learn best from experience, but (sometimes) we never directly experience the consequences of our most important decisions” as often the primary consequences of our actions occur, as he puts it, “somewhere in the distant future or in a distant part of the larger system within which we operate”. In unpacking this thought, Senge introduces the concept of learning horizons, which he describes as the “breadth of vision in time and space within which we assess our effectiveness”. He claims that “when our actions have consequences beyond our learning horizon, it becomes impossible to learn from direct experience”.

The role of reflection

However, a growing body of evidence suggests that learning horizons can be significantly expanded through strong, reflective practice and it is in this area that team coaching programmes and interventions can be successfully utilised. Kouzes and Posner in The Leadership Challenge demonstrate that regular exploration of the question “what can we learn from this?” can triple team effectiveness when compared to those who rarely or never reflect in this way.

Thus, not only does “what can we learn from what just happened?” become a stock question in a team coach’s tool-kit, but it is frequently expanded into the wonderful question, “what can we learn from what just happened that will improve life for your customers’ customers?” This creates an impetus within the team towards thinking more systemically; towards locating itself within its wider organisational context, which in turn delivers much greater “breadth of vision in time and space”. In so doing, the coach is empowering the team to more deeply develop the capacity to take perspectives, view authority in new ways, and see shades of grey where they once saw only black and white.

Complexity, ambiguity and change

Today’s organisations want their workforce to handle complexity, ambiguity and all the accompanying stress involved in working in an environment of constant change. However, coping well with such issues is not simply a skill anyone can acquire, but more a way of living in the world.

Robert Kegan, author of The Evolving Self, calls this process one of “meaning making’ which, if you think about it, is pretty much the challenge that our artificially intelligent chum AlphaGo-Zero was faced with when presented, sans instructions, with a 2,500 year old Chinese game that offered more potential moves than the number of atoms in the known universe!

Volatility, uncertainty, complexity and ambiguity have become ubiquitous working companions. They are ever-present members of every team in every organisation, presenting us all with challenges that even five years ago would not have seemed feasible. It is within that working environment that our teams have to begin to, as the psychologists put it, “construe, understand, or make sense of events, relationships and the self.”

New terrain

This is new terrain for most and while many teams may not be presented with quite as many move options in each new situation as there are atoms in the known universe, there are a lot more options than there used to be. Teams therefore need to inculcate this ability to constantly self-teach into their DNA by “retaining, reaffirming, revising, or replacing elements of their orienting system to develop more nuanced, complex and useful systems”, as noted by James Gillies in The Meaning of Loss.

To do that, they need to learn to reflect, as Peter Hawkins puts it in Leadership Team Coaching, “on their own performance and multiple processes, and consolidate their learning ready for the next cycle of development: for themselves, for their wider systems and to create positive consequences somewhere in the distant future”.
And perhaps the most compelling “why” of team coaching is to empower them to do just that.

Ian Mitchell and Sian Lumsden are co-founders of Eighty20 Focus, a real-time executive coaching organisation.