Ian Pitchford
Sept. 9, 2018
3:06 a.m. PDT

Structured Analytic Methods

Now that we have plenty of time for reflection I think the weakest part of Carbon was the lack of exposition. Very few participants articulated strong reasons for their forecasts and we didn’t have many alternative hypotheses / models to consider. This made it difficult to weight whatever evidence we were able to track down.

One of the most interesting papers I’ve read recently is Stephen J. Coulthart’s ‘An Evidence-Based Evaluation of 12 Core Structured Analytic Techniques’. He derives three evidence-based principles from the publsihed data:
1. Two heads are not always better than one (face-to-face collaboration reduces idea-generation and creativity).
2. Weight and update (an important determinant of analytical accuracy is the proper weighting of evidence).
3. Careful conflict (conflict-inducing techniques, such as devil’s advocacy, can be constructive in intelligence analysis but only if carefully implemented).

Coulthart found that devil’s advocacy was “the technique with the most credible evidence base and highest efficacy”. This was also the thing that was most lacking on Carbon. Hopefully the next iteration of the HFC will (a) do more to encourage participation and (b) do more to facilitate polite debate about alternative hypotheses and different ways of modelling the IFPs.

8 Replies

Alfred Bender
Sept. 9, 2018
7:03 a.m. PDT
To your point, @IanPitchford, I think there would have been more sharing of forecasting ideas if there was a general forum for discussion that wasn't question specific. Obviously, one anecdote does not make a trend, but there were a few points when I thought about posting something related to forecasting strategies, but felt weird just posting it on a random question (I just discovered this forum the other day - thanks to you and DSarin for posting!).

In case you find it helpful/interesting, after after a particularly poor forecast (whether Trump would withdraw from the Iran agreement) where I felt like I severely mis-weighed the available data, I have been experimenting with a basic model for reducing my internal bias and more accurately evaluating data points.

Basically, I am collecting 5 different estimates from myself each intended to represent a point on a normal distribution. Considering a binary question, the five forecasts I make are:
1. What is the strongest probabilistic case I can make for Option A?
2. What is a mildly biased probabilistic case for Option A?
3. What do I think is the most likely probabilistic case for Options A & B? (i typically ask this question last)
4. What is a mildly biased probabilistic case for Option B?
5. What is the strongest probabilistic case I can make for Option B.

I then weigh each probability:
1. 2.5
2. 13.5
3. 68
4. 13.5
5. 2.5

Add them and divide by 100.

It's hard to prove a counter factual and rule out hindsight bias, so its hard to say how effective it has been, but I do think that having a structured approach to think about all of the data I have and what implies weights I place on that data has been helpful for me. I would be curious to hear if you (or anyone else) have any thoughts. Thanks!

Ian Pitchford
Sept. 9, 2018
12:08 p.m. PDT
I agree completely about the general forum for discussion Alfred. I hope they implement this in the next season of HFC. Thanks also for your model for reducing bias. I hadn’t thought of that.

Dmitry Sarin
Sept. 11, 2018
5:29 a.m. PDT
Hi Alfred. Prescience had a General Forum in addition to question related threads. Did it improve the performance? I am not convinced. Well, we have to wait for their results to come in.

I would expect there to be a lot more sharing in the condition where forecasters were assigned teams but as I understand the results were really mixed. I wouldn't have survived to the end assigned to a team with Alfredalin84 or Slobodan and esp if they averaged team scores.

Dmitry Sarin
Sept. 19, 2018
6:58 a.m. PDT
Looks like Prescience was the most friendly/ co-operative platform.
From one user: Members were frequently sharing sources and techniques, helping each other out, even exchanging R code. Held Google hangouts sessions. Good postmortems

Ian Pitchford
Sept. 19, 2018
7:18 a.m. PDT
It will be interesting to see if there’s any correlation between collaborative behaviour and performance.

Alfred Bender
Sept. 20, 2018
11:30 a.m. PDT
Do either of you know if we will be on the same platform for season 2, or if there will be any performance based sorting?

Dmitry Sarin
Sept. 20, 2018
12:28 p.m. PDT
Still TBD I guess
"Competing research teams. Qualifying participants will be assigned to one of the research teams competing to find the best methods of forecasting."

Ian Pitchford
Sept. 20, 2018
12:36 p.m. PDT
As each phase is supposed to be a separate RCT I would guess that the new HFC volunteers will be allocated randomly to one of the four platforms. The goal is 1000 forecasters per HFS platform, but we’ve obviously not achieved that in phase 1. The data Lars has posted so far hasn’t shown any of the performers beating the control group (Carbon) by the stipulated benchmark (Table 3, p. 12) and so it will be really interesting to see the scores after the final IFP is processed on 24 September.