Welcome

The aim of the CHEERS challenge is to exploit the latest advances in Natural Language Processing (NLP) to assist responders and analysts in humanitarian crises for analyzing and harvesting valuable information from data. The challenge is hosted by Data Friendly Space (DFS), a non-profit organization, and is also supported by the academic researchers at the Johannes Kepler University Linz. We encourages anyone who is interested in advancing the applications of NLP and Deep/Machine Learning in the humanitarian sector to take part in this challenge, as the benefits would be immediately seen in helping to increase the quality of the humanitarian community’s data analysis. As such, humanitarian analysts would be able to spend time doing what the human mind does best: subjective analysis of information.

Table of contents

Context and Background

Who is Data Friendly Space?

Deep Logo

Data Friendly Space (DFS) is a non-profit organization based in the United States with a global presence. DFS’ guiding principle is to improve information management and analysis capacity, tools and processes in the humanitarian and development community to enable better informed and more targeted assistance. DFS staff is composed of experts from the humanitarian information management and analysis field who specialize in real time secondary data review and build humanitarian applications that support fast extraction of information from large volumes of unstructured data.

DFS also focuses on creation of data centric web applications, websites and mobile applications to support humanitarian organizations. When building software, DFS focuses on the intersection between data automation processes powered by Artificial Intelligences and human knowledge and skills, in particular when one can help the other to execute analysis. More information on Data Friendly Space and its projects can be found here.

Product: DEEP Platform

Deep Logo

DFS is the technical host of the Data Entry and Exploration Platform (DEEP, thedeep.io), a tool used by humanitarians all over the world to monitor and assess crises. The DEEP project provides effective solutions to analyze and harvest data from secondary sources such as news articles, social media, and reports that are used by responders and analysts in humanitarian crises. During crises, rapidly identifying important information from the constantly-increasing data is crucial to understand the needs of affected populations and to improve evidence-based decision making. The DEEP has been used by many organizations in multiple crisis contexts, such as:

The aim of the DEEP platform is to provide insights from years of historical and in-crisis humanitarian text data. The platform allows users to upload documents and classify text snippets according to predefined humanitarian target labels, grouped into and referred to as analytical frameworks. Tagging this data leads to the structuring of large volumes of information that enables effective analysis of the humanitarian conditions of the populations of interest and empowers humanitarians to identify information gaps and to provide sound recommendations in terms of needs assessment strategies and response plans. DEEP supports global operations of a range of international humanitarian organizations and the United Nations.

More information on the DEEP and how it is being used can be found here:

How can NLP help in humanitarian crises?

The day-to-day workload of analysts and experts with DEEP concerns manual tagging of secondary data resources. These experts have extensive domain knowledge to understand how to use the analytical framework with its different taxonomies in order to assign the right labels to the right text snippets. Below, you can see the interface of the DFS/IMMAP analytical framework, where the experts are asked to assign appropriate classes to a text snippet.

Analytical Framework

This process of selecting informative text excerpts from documents, and assigning correct tags is highly laborious and time consuming, while time is the decisive factor during humanitarian crises. The innovation of the DEEP relies upon leveraging recent advances in NLP to automate this process.

CHEERS Challenge

Round 1: Extraction and Classification of Humanitarian Data

The Round 1 of the challenge simulates the document processing procedure, normally conducted by analysts. In particular, given a document consisting of a number of sentences, a system is asked to:

In the following, we explain the dataset and task, followed by describing the run files format, and evaluation metrics.

Dataset

The dataset of this round is available here and consists of the following files:

data_round_1/
  documents_train_en.csv
  documents_val_en.csv
  documents_test_en.csv
  sentences_train_en.csv
  sentences_val_en.csv
  sentences_test_en.csv
  immap_sector_name_to_id.json
  Terms of Use.txt

The primary data for the challenge is available in the sentences_<split>_en.csv files. Each of these files contain the following columns:

In addition to sentences, the documents_<split>_en.csv files provide the full text of the original documents, from which the sentences are extracted. Participant are free to use these original documents as any sort of additional data. The document files have the following columns:

The corresponding name of each sector is available in immap_sector_name_to_id.json. All provided documents and sentences in this round are in English.

Task

As mentioned before, this task consists of two consecutive steps:

Please consider that in the provided data, the sector_ids column can provide more than one value. However, we explicitly limit the prediction of sectors to one class. In fact, the users are free to exploit one or multiple sector(s) of the sentences during training, while at inference time, only one sector should be provided. The exact format of the run file is explained in the following.

Run File Format

The run (output) file format is in CSV format and have the following columns:

For example, if the value of sector_ids of some sentences could be between 1 to 7, a possible run file can look like:

doc_id, sentence_id, is_relevant, sector_id
0, 0, 0, -1
0, 1, 1,  1
0, 2, 0, -1
1, 0, 1,  4
2, 0, 1,  2
2, 1, 1,  7
2, 2, 0, -1
2, 3, 1,  3
2, 4, 0, -1

The code for creating a random baseline for the validation set is provided in create_random_baseline.py. Considering that the data is available in the same path as this script, the following command generates the run file of this baseline:

python create_random_baseline.py

Evaluation Metrics

Run files are evaluated according to three scores: First, the is_relevant predictions are evaluated with Macro F1 Score.

Next, for the sentences with predicted is_relevant=1, their corresponding predicted sectors are evaluated using a version of Accuracy based on the Hamming Score - in a similar way as in this paper. According to this formulation, Accuracy is defined as:

\[Accuracy = \frac{1}{n}\sum_{i=1}^{N}{\frac{\lvert Y_i\cap \{z_i\} \rvert}{\lvert Y_i\cup \{z_i\} \rvert}}\]

where \(Y_i\) is the set of labels in the ground truth data, and \(z_i\) is the predicted class. Please consider that Accuracy in this formulation is only calculated for sentences with predicted is_relevant=1. Also, the sentences in the ground truth with is_relevant=1 but without any sectoral information (sector_ids=[]) are ignored during calculating Accuracy. Finally, since only one sector can be provided in the run file, the overall Accuracy value will not be equal to one, as some sentences in the ground truth have multiple sectors. To elaborate it with an example, if a sentence has sector_ids=[2, 4] in ground truth, and the predicted sector_id in the run file is 2, then the Accuracy of this sentence would be equal to 1/2.

Finally, to be able to measure the overall performance of a system, we combine the F1 Score and Accuracy and introduce HumImpact – the Humanitarian Impact evaluation metric. HumImpact is defined as:

\[HumImpact = 0.5*F1\_Score + 0.5*Accuracy\]

Please use evaluation.py script to evaluate the run file. The following command provides the evaluation results of the above-created random baseline:

python evaluation.py --ground-truth-path data_round_1/sentences_en_val.csv --runfile-path runs/en_val_baseline_random.csv

which outputs:

Macro-averaged F1 score for is_relevant variable is: 49.92
Accuracy for sector_ids variable is: 44.57
HumImpact Score is: 47.24
{'relevance_f1_score_macro': 0.4992029079372476, 'sectorids_accuracy': 0.4456756756756766, 'HumImpact': 0.4724392918064621}

Results on Leaderboard

coming soon…!

Terms and Conditions

The provided datasets are intended for non-commercial research purposes to promote advancement in the field of natural language processing, information retrieval and related areas in the humanitarian sector, and are made available free of charge without extending any license or other intellectual property rights. In particular:

Contact Us

If you have any questions regarding technical aspects of the dataset, or how to obtain it, please contact us: