Superforecasting

Superforecasting: The Art and Science of Prediction“Beliefs are hypotheses to be tested, not treasures to be protected.” – Philip E. Tetlock and Dan Gardner

Thinking, Fast and Slow is one of my favorite books. In it, Daniel Kahneman details how the human mind works in two modes: one fast and effortless, the other slow and laborious. You engage the slow system to split the check among three friends. The fast system works automatically, filling in blanks and recognizing patterns. It allows us operate smoothly on partial information. However, that quick judgement system can also lead us into dangerous biases and overconfidence.

In Superforecasting: The Art and Science of Prediction, Tetlock and Gardner apply the principles of behavioral economics to the practice of forecasting. Tetlock is the researcher whose previous studies led him to conclude that most expert prognosticators predicted future events no more accurately than dart-throwing chimps.

Tetlock led a major prediction effectiveness study called The Good Judgement Project (GJP). Tetlock and his co-researchers enlisted several thousand volunteers as contestants in a prediction competition. To be statistically meaningful, contestants had to make hundreds of predictions. They were of the sort… Will Scotland vote to secede from the UK? Will the Swiss examination of Yasser Arafat’s exhumed bones find traces of polonium?  

They established a system by which competitors were scored based on a combination of correctness and confidence level. They identified the top 2% as superforecasters. These people consistently predict events with much higher accuracy than everyone else.

You might expect that intelligence is the primary factor that sets the superforecasters apart from the rest, but while they were of above-average intelligence (top 20% of the population), it was the superforcaster’s ability to stay objective, counteract their own biases, and question their own beliefs that make them different and so effective. In large part, they were better at avoiding the biases identified by Kahneman.

From the book, here is a summary of the traits of the superforecasters:

In philosophic outlook, they tend to be:

CAUTIOUS: Nothing is certain
HUMBLE: Reality is infinitely complex
NONDETERMINISTIC: What happens is not meant to be and does not have to happen

In their abilities and thinking styles, they tend to be:

ACTIVELY OPEN-MINDED: Beliefs are hypotheses to be tested, not treasures to be protected
INTELLIGENT AND KNOWLEDGEABLE, WITH A “NEED FOR COGNITION”: Intellectually curious, enjoy puzzles and mental challenges
REFLECTIVE: Introspective and self-critical
NUMERATE: Comfortable with numbers

In their methods of forecasting they tend to be:

PRAGMATIC: Not wedded to any idea or agenda
ANALYTICAL: Capable of stepping back from the tip-of-your-nose perspective and considering other views
DRAGONFLY-EYED: Value diverse views and synthesize them into their own
PROBABILISTIC: Judge using many grades of maybe
THOUGHTFUL UPDATERS: When facts change, they change their minds
GOOD INTUITIVE PSYCHOLOGISTS: Aware of the value of checking thinking for cognitive and emotional biases

In their work ethic, they tend to have:

A GROWTH MINDSET: Believe it’s possible to get better
GRIT: Determined to keep at it however long it takes

To learn what each of these traits mean and how superforecasters manifest them to make significantly better predictions than their peers, check out the book. It’s like Thinking Fast and Slow applied to prediction with a heavy dose of The Black Swan, The Wisdom of Crowds, and Mindset. It was a great read/listen, and I highly recommend it!

What is a Data Scientist

Extracting meaning from data is nothing new, but the world has really woken up to the value of predictive analytics and machine learning… preference and recommendation engines, effective marketing, spam filters that actually work, better medicine, even self-driving cars. This new focus has created a scramble as companies have tried to find people with the skills needed to get them into the predictive game. This scramble has led to two problems: 1) what, exactly am I looking for (not just programmers and not statisticians), and 2) where are these people?

Emergence of the Data Scientist

The world has settled on the terms Data Science and Data Scientist. HBR famously referred to the Data Scientist as the sexiest job of the 21st century.

I like the term because its practitioners are applying the scientific method while working in the medium of data–creating and validating hypotheses, making discoveries, and improving life in myriad ways.

A data scientist is more than a statistician:

  • The data is not sitting in nice, neat SAS datasets. It’s in unstructured social media networks, streaming off of sensors, or in various other messy forms.
  • The machine learning algorithms bringing the breakthrough innovations are more computational than mathematical.
  • Implementation of the insights coming from the data require significant programming.

A data scientist is more than a programmer:

  • Programmers don’t normally think in terms of designing and executing experiments.
  • They must understand what data these experiments require and what can be inferred from the data.
  • The big data aspect requires specialized skills in distributed computation.

So, What is a Data Scientist?

This rare combination of skills–and the hype surrounding the field–has led to some fun definitions of the data scientist:

DataScientistDefinition

These snarky definitions have been pretty popular as well:

  • “Data Scientist is a Data Analyst who lives in California”
  • “A data scientist is a business analyst who lives in New York.”
  • “A data scientist is a statistician who lives in San Francisco.”
  • “Data Science is statistics on a Mac.”

Hype and cynicism aside, the world needs more technologists that can program, handle data, and have a mastery of inferential statistics. There is an incredible need and the work is intellectually stimulating. This has motivated many developers to learn to be data scientists, myself included.

Next up… approaching the data science field as a developer.