BANNER_165_ivtefi_NLcover

Understanding how forecast users make decisions

Mark J. Rodwell, David S. Richardson (both ECMWF), John Hammond, Sara Thornton (both weathertrending)

 

A Royal Meteorological Society ‘Live Science’ event, hosted at ECMWF, allowed us to investigate how forecast users combine objective forecast probabilities with their own subjective feelings when making weather- dependent decisions. Such decisions are integral to the overall utility of forecasts.

Design of the study

For each of the 74 participants we identified the ‘threshold probabilities’ at which they decided to “go to the beach” in five days’ time with the possibility of warm dry weather, and at which they decided to “pack up and leave” a campsite in the face of potentially dangerous winds tomorrow. While making their decisions, participants were encouraged to elaborate mentally on each scenario as it might apply to them – who would they be with, what would they do, how far from home would they be, etc.? A key question in this study was whether users identify the threshold probability which optimises their expected feeling (or utility) about their decision. If they do this, then they are making their ‘Bayes Action’, and their feeling afterwards represents a ‘proper score’ of the forecast. Proper scores are fundamental in the development process of forecasting systems as they reward systems which issue ‘reliable’ (unbiased) probabilities, and which have better deterministic properties. If the users’ distribution of threshold probabilities is sufficiently consistent with their Bayes Actions, then there is the potential to develop scores which encourage user-oriented forecast system development.

Participants’ threshold probabilities.
%3Cstrong%3EParticipants%E2%80%99%20threshold%20probabilities.%3C/strong%3E%20Distribution%20of%20the%20participants%E2%80%99%20threshold%20probabilities%20for%20their%20decision%20to%20go%20to%20the%20beach%20in%20five%20days%E2%80%99%20time%20with%20the%20prospect%20of%20warm%20dry%20weather%20(temperatures%20greater%20than%2020%C2%B0C%20and%20with%20less%20than%200.5%20mm%20rain%20in%2024%20hr),%20and%20for%20the%20decision%20to%20leave%20a%20campsite%20with%20the%20prospect%20of%20dangerous%20winds%20tomorrow%20(sustained%20wind-speeds%20of%20more%20than%2011%20ms%3Csup%3E%E2%88%921%3C/sup%3E%20with%20stronger%20gusts).%20The%20dashed%20lines%20indicate%20the%20climatological%20frequency%20of%20each%20event.
Participants’ threshold probabilities. Distribution of the participants’ threshold probabilities for their decision to go to the beach in five days’ time with the prospect of warm dry weather (temperatures greater than 20°C and with less than 0.5 mm rain in 24 hr), and for the decision to leave a campsite with the prospect of dangerous winds tomorrow (sustained wind-speeds of more than 11 ms−1 with stronger gusts). The dashed lines indicate the climatological frequency of each event.

Results of the study

The distribution of participants’ threshold probabilities shows that the majority of them would plan to go to the beach if the probability of good weather exceeds about 0.7 (or 70%, see the first panel of the first figure). For the camping scenario (second panel), participants generally avoid dangerous wind at lower probabilities. Although our participants may be more familiar with probability information than the general user, we might assume that they represent the same range of feelings about a day at the beach or the prospect of dangerous winds. A vox pop of the general public actually reveals similar distributions of decisions. Hence a forecast presenter could perhaps interpret the probabilistic forecast for their audience: suggesting it would be worth making plans to visit the beach if the probability exceeded 70 or 80%. For the camping scenario, based on the threshold probability distribution, the presenter should certainly raise the alarm at a 30% probability.

More detailed questioning indicates that, for the beach scenario, participants appear to be balancing the potential ‘Thrill’ of a nice day at the beach – “I love being on beaches, whatever the weather” – with the ‘Pain’ of a bad day at the beach and feelings about travel costs – “I hate sitting on the beach in the rain ... and with three kids it’s quite an expedition”. For the camping scenario, participants appear to be balancing the ‘Pain’ of curtailing a family holiday with the potential ‘Regret’ of putting loved ones in harm’s way: “With a low probability, I’d feel responsible for taking away my family’s fun. However, as a parent, I wouldn’t want to put very young kids at risk of flying branches. If it had been a very high probability and I hadn’t done anything I’d feel responsible”. However, a participant who chose a high threshold probability stated that “I don’t really go camping. If I’m already there, I may as well stay as long as possible. A case of making it an adventure with the family pulling together to stop the tent being blown away”. The reason for this apparent ‘risk-seeking’ behaviour may be a lack of first-hand experience – something that will be inevitable for many users when faced with a climatologically-rare, yet dangerous weather event. In recognition of this, a forecast presenter might decide that a dangerous wind warning should be issued at a probability lower than 30%, and authorities might also take more coercive action.

The User Brier Score

We suggest that the obtained threshold probability distributions are more consistent with the participants’ Bayes Actions than, say, if we assumed a flat (i.e. uniform) or strongly peeked distribution. Certainly, the two distributions differentiate the scenarios in a reasonable way. We therefore assume that we can approximately equate the distribution of threshold probabilities with the distribution of the users’ ‘cost/loss ratios of feelings’. This allows us to calculate ‘expense per unit loss’. We propose a ‘User Brier Score’ (UBS: see box) which measures the relative expense incurred by the user community as a whole, when provided with the forecast information. The UBS is asymptotically proper as the sample size increases, lies in the range [0,1], and reduces to the well-known Brier Score (BS) for the case when the users’ distribution of cost/loss ratios is uniform. We use the UBS to score the ECMWF operational medium-range ensemble forecast for the period 1995–2018 using ‘SYNOP’ point observations for verification (locations between 50–60°N, June–August for the beach scenario and September– November for the camping scenario). This effectively scores the raw forecasts for our bivariate (temperature and precipitation) and extreme (wind) events at the location of the user. For both scenarios (the second figure shows the strong wind scenario), the UBS is higher than the BS, largely because the users’ threshold distributions put less weight on high cost/loss ratios than does a uniform distribution. The general downward trend indicates reduced expense and thus improvement.

User Brier Score (UBS)

  where Ep, Eo and E(1-o) are the ‘expenses’ incurred by a user if they took their Bayes Action when given: forecast probability p, a perfect forecast (knowledge of the outcome) o{0,1}, and the worst possible forecast 1 – o, respectively. An overbar  ̅  indicates the mean over a representative sample of forecasts, and a tilde ~ indicates the mean over the set of users.

Much of the improvement, particularly for the campers, is due to a big reduction in ‘complete misses’ (where zero ensemble members predict the event, but there is a non-negligible outcome frequency). Complete misses remain a key issue, however, and this study suggests users would benefit from continued research into the modelling of extreme weather, and its likelihood when on the edges of the forecast distribution.

User Brier Score and Brier Score.
%3Cstrong%3EUser%20Brier%20Score%20and%20Brier%20Score.%3C/strong%3E%20The%20User%20Brier%20Score%20(blue)%20for%20the%20user%20community%20as%20a%20whole%20based%20on%20their%20indicated%20decisions%20in%20the%20face%20of%20potentially%20dangerous%20winds.%20The%20standard%20Brier%20Score%20is%20also%20shown%20(red).%20The%20curves%20have%20been%20smoothed%20with%20a%20five-year%20running-mean,%20the%20central%20year%20being%20indicated%20on%20the%20x-axis.
User Brier Score and Brier Score. The User Brier Score (blue) for the user community as a whole based on their indicated decisions in the face of potentially dangerous winds. The standard Brier Score is also shown (red). The curves have been smoothed with a five-year running-mean, the central year being indicated on the x-axis.

Further information can be found in Rodwell, M.J. et al., 2020, QJR Meteorol Soc., doi:10.1002/qj.3845.