ICCO Cooperation gathers information about what people eat and whether they run the risk of suffering from hunger. We usually do so by drawing a sample, going out in the field with a questionnaire and analyse the data; as you can read in one of my previous blogs. However, there are two major drawbacks in gathering these data – it is costly and time consuming. We therefore started to ask ourselves the question: can open data serve as a cost-effective alternative to predict food security?
The answer to this question is not a simple yes or no. But in the last months of 2017 we developed a better understanding. How? By hiring and supervising – together with the 510 data team of the Netherlands Red Cross – Wesley van der Heijden, an MsC student information management. He developed a first prototype of a model that uses open data to predict food security in Ethiopia at local level.
We took the following steps:
1. Identifying the main drivers for food security
First, we had to decide upon which drivers are crucial for estimating if households suffer from hunger. Scientific literature is helpful in finding these drivers for food security. We identified 25 elements, such as rainfall, market prices of crops, vegetation, land cover, relative wealth and population density. Some of these drivers are of crucial importance in relation to acute food crises, while others describe chronic food insecurity. These 25 variables formed the input data for the model.
The outcomes of the model have to be tested against existing information – we call this output data. We choose the Famine Early Warning System Network (FEWS NET). At the moment, this seems to be the most accurate and complete system to predict hunger that has information at sub-national level. It is based on data, surveys and data interpretation by humans (and thus not machines). In other words, we compare the open data in our model with the classification used by FEWS NET. Our hypothesis: If the data from our model approach the data of FEWS NET then our model can be integrated into the Community Risk Assessment Toolbox of the Netherlands Red Cross.
2. Machine learning
Open data exist for many of the selected drivers, e.g. satellite data for rainfall and vegetation, livelihood zones defined by USAID and (spikes in) market prices by the World Food Program. After collating these data, a machine learning methodology was developed; the idea was to find the right combination of drivers to approach the existing FEWS NET classification. We did not combine these drivers ourselves; we asked a computer to do so. This is what we call machine learning. It can be applied for knowledge discovery without explicitly programming a computer what to do. Basically it means that the computers searches for the most relevant combination of input drivers – it ‘learns’ and improves algorithms itself. As such, it selects relevant and removes redundant variables and calculates the best combination of data.
For our more technical readers: we based our model on ordinal classification with a random forest model as underlying algorithm. This model uses three categories to indicate food security: minimal, stressed and urgent action required. An important condition is that the model will allow to create food security assessments on lower administrative levels – the more precise the better. As data on the input variables needs to be on the same geographical level and timeframe, Ethiopia was divided into 1.419 zones - see Figure 1. Data was used from July 2009 up to June 2017. Figure 1 shows as an example of a Current Situation Assessment of FEWS NET translated to the 1.419 zones and the three classes.
Figure 1 FEWS NET Current Situation Assessment (06-2017) translated to the 1.419 zones and to the three levels of Minimal (green), Stressed (yellow) and Urgent Action (red).
After the development of the methodology, results of different model variants were compared. Before jumping to conclusions, it is important to consider the limitations of the research and the model. First, crucial data might be missing. Factors such as migration, war and conflict, diseases and remoteness can affect food security and are not taken up in the model. Second, not all open data are reliable. In particular the data for market prices show important gaps and are doubtful. Thirdly, although FEWS NET provides a guide on their approach, the exact methodology behind the data is unknown. A better understanding of their methodology will probably lead to better fine-tuning of our model.
4. Next steps
Evaluation metrics are not deemed high enough to directly apply the model in practice. So, was this all useless? Not at all. We are confident enough that further research will help us in fine-tuning this or alternative models. We will take the following steps:
- Finding more relevant data to enrich the model
- Test the model in other countries
- Hire more profound technical expertise to validate our findings
- Increase the understanding between traditional aid workers and ‘techies.’
The outcome might be that – at this stage – developing a model to predict food security among households based on open data is a road with a dead end. But it might also lead to a surprising conclusion: that it can replace human interpretation of available data and information. It might save us time and money and support development organizations in making sure aid money reaches the people it should reach.
Using machine learning to indicate food insecurity might be an achievable method of assessment in the near future. We’ll keep you posted along the way.
ICCO would like to thank Wesley van der Heijden, Marc van den Homberg (NRK) and Roza Freriks (DCHI) for their support. Wesley’s thesis can be found here.