The Open Cities AI Challenge

Post by Dave LuoGrace Doherty, and Nicholas Jones, GFDRR Labs/World Bank

This article was originally published on Towards Data Science.

Takeaways

The Global Facility for Disaster Reduction and Recovery (GFDRR) is partnering with Azavea and DrivenData to introduce a new dataset and machine learning (ML) competition ($15,000 in total prizes) to improve mapping for resilient urban planning. Better ML-supported mapping for disaster risk management means addressing barriers to applying ML in African urban environments and adopting best practices in geospatial data preparation to enable easier ML usage. The competition dataset — over 400 square kilometers of high-resolution drone imagery and 790K building footprints — is sourced from locally validated, open source community mapping efforts from 10+ urban areas across Africa. Prize-winning solutions will be published as open-source tools for continued ML development and benchmarking.

The Open Cities AI Challenge has two participation tracks:

  1. $12,000 in prizes for best open-source semantic segmentation of building footprints from drone imagery that can generalize across a diverse range of African urban environments, spatial resolutions, and imaging conditions.
  2. $3,000 in prizes for thoughtful explorations of Responsible AI development and application for disaster risk management. How might we improve the creation and use of ML systems to mitigate biases, promote fair and ethical use, inform decision-making with clarity, and make safeguards to protect users and end-beneficiaries?

The competition is ongoing and ends March 16th, 2020. Join today!

Open Data for Resilient Urban Planning

Cities around the world are growing rapidly, especially in Africa — by 2030, half of Sub-Saharan Africa’s population will live in urban areas. As urban populations grow, their exposure to flooding, erosion, earthquakes, coastal storms, and other hazards becomes a complex challenge for urban planning.

Understanding how assets and people are vulnerable to these risks requires detailed, up-to-date geographic data of the built environment. For example, a building’s particular location, shape, and construction style can tell us whether it will be more exposed to earthquake or wind damage than nearby buildings. Roads, buildings, and critical infrastructure need to be mapped frequently, accurately, and in detail if we are to understand and manage risk effectively. But in countries with less developed data infrastructure, traditional urban data collection methods can’t keep up with increasing density and sprawl.

A field mapper from Open Cities Accra observes standing water and refuse in a flood-prone neighborhood of Accra, Ghana. Photo courtesy of Gabriel Joe Amuzu, Amuzujoe Photography.

Thankfully, collaborative and open data collection practices are reshaping the way we map cities. Today, local mapping communities are improving maps for some of the world’s most vulnerable neighborhoods — bringing highly accurate and detailed geographic data up-to-date and to scale. GFDRR at the World Bank supports programs like Open Cities Africa and Dar Ramani Huria to map buildings, roads, drainage networks and more in over a dozen African cities, and Zanzibar Mapping Initiative was the world’s largest aerial mapping exercise using consumer drones and local mappers to produce open spatial data for conservation and development in the archipelago.

To-date, OpenStreetMap contributors have mapped more than 70 million ways and 600 million nodes onto the African continent.

Data collected in these community mapping programs are used to design tools and products that support government decision-making. Digitized maps are published to OpenStreetMap and aerial imagery to OpenAerialMap where they serve as data public goods that can be used and improved by all. The open source philosophy behind the movement and an emphasis on local skill-building has fostered local networks of talent in digital cartography, robotics, software development, and data science.

Potential of Machine Learning for Mapping

Advances in ML for visual tasks could further improve mapping quality, speed, and cost. Recent examples of ML applications for mapping include Facebook’s AI-assisted mapping tool for OpenStreetMap and Microsoft’s country-scale automated building footprint extraction (in USACanadaTanzania and Uganda). Competitions like SpaceNet and xView2 advance ML practices for automated mapping of roads, buildings, and building damage assessment after disasters.

Obstacles, however, stand in the way of effectively applying current ML mapping solutions to the African disaster risk management context. Africa’s urban environments differ significantly in make-up and appearance from European, American, or Asian cities which have more abundant data that ML models are often trained on.

Buildings that are more densely situated and diverse in shape, construction style, and size may be less recognizable to ML models that saw few or no such examples in their training.

Imagery is collected by commercial drones at much higher resolution under diverse environmental conditions, requiring adaptation of models usually trained on lower-resolution, more consistently collected and preprocessed satellite imagery.

Crowdsourced and community-driven data labeling may differ greatly in what base imagery layers are used, workflow, data schema, and quality control, requiring models that are robust to more label noise.

Geospatial data comes in a diversity of file formats, sizes, and schemas that create high adoption and knowledge barriers that hamper their use in machine learning.


There is now a growing abundance of locally-validated open map data and high resolution drone imagery in diverse built environments. How might we best address these obstacles and enhance the state of practice in machine learning to support mapping for urban development and risk reduction for Africa’s cities?

Introducing the Open Cities AI Challenge

Dataset

Working with partners Azavea and DrivenData, the Labs team at GFDRR combined the excellent work of many participatory mapping communities across Africa, applied best practices in cloud-native geospatial data processing (i.e. using Cloud-Optimized GeoTIFFs [COG] and SpatioTemporal Asset Catalogs [STAC]), and standardized wherever possible to make data more readily usable for machine learning. The result is a novel, extensive, open dataset of over 790K building footprints and 400 square kilometers of drone imagery representing 10 diverse African urban areas in ML-ready form.

Comparing hand-labeled building footprints overlaid on drone imagery for 10 African urban areas included in the Challenge training dataset.

Using COG and STAC for geospatial data provides us with bandwidth-efficient, rapid, and query-able access to our imagery and labels in a standardized format. Ease of access to files and indexing of data catalogs is particularly important for geospatial data which can quickly grow to 100s of gigabytes. It also enables us to tap into the growing ecosystem of COG and STAC tools, like STAC Browser to rapidly visualize and access any training data asset in a web browser, despite individual image files being up to several GBs and the entire dataset totaling over 70 GBs in size:

Animated demo of using STAC Browser to visualize Challenge training data collections and assets .

PySTAC, a new Python library by Azavea, enables STAC users to load, traverse, access, and manipulate data within catalogs programmatically. For example, reading a STAC catalog:

train1_cat = Catalog.from_file('https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/catalog.json') 
train1_cat.describe()* <Catalog id=train_tier_1>
* <Collection id=acc>
* <Item id=665946>
* <LabelItem id=665946-labels>
* <Item id=a42435>
* <LabelItem id=a42435-labels>
* <Item id=ca041a>
* <LabelItem id=ca041a-labels>
* <Item id=d41d81>
* <LabelItem id=d41d81-labels>
* <Collection id=mon>
* <Item id=401175>
...

Inspecting an item’s metadata:

one_item = train1_cat.get_child(id='acc').get_item(id='ca041a')
one_item.to_dict(){
"assets": {
"image": {
"href": "https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/acc/ca041a/ca041a.tif",
"title": "GeoTIFF",
"type": "image/tiff; application=geotiff; profile=cloud-optimized"
}
},
"bbox": [
-0.22707525357332697,
5.585527399115482,
-0.20581415249279408,
5.610742610987594
],
"collection": "acc",
"geometry": {
"coordinates": [
[
[
-0.2260939759101167,
5.607821019807083
],
...
[
-0.2260939759101167,
5.607821019807083
]
]
],
"type": "Polygon"
},
"id": "ca041a",
"links": [
{
"href": "../collection.json",
"rel": "collection",
"type": "application/json"
},
{
"href": "https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/acc/ca041a/ca041a.json",
"rel": "self",
"type": "application/json"
},
{
"href": "../../catalog.json",
"rel": "root",
"type": "application/json"
},
{
"href": "../collection.json",
"rel": "parent",
"type": "application/json"
}
],
"properties": {
"area": "acc",
"datetime": "2018-11-12 00:00:00Z",
"license": "CC BY 4.0"
},
"stac_version": "0.8.1",
"type": "Feature"
}

Learn more about the dataset and STAC resources.

Competition

Accompanying the dataset is a competitive machine learning challenge with $15,000 in total prizes to encourage ML experts globally to develop more accurate, relevant, and readily usable open-source solutions to support mapping in African cities. There are 2 participation tracks:

Semantic Segmentation track$12,000 in prizes for the best open-source semantic segmentation models to map building footprints from aerial imagery.

The machine learning objective is to segment (classify) every pixel in every image as building or no-building with model performance being evaluated with the Intersection-over-Union metric (aka Jaccard Index):

Semantic segmentation is useful for mapping because its pixel-level outputs are relatively easy to visually interpret, verify, and use as-is (e.g. in the calculation of built-up surface area) or as inputs to downstream steps (e.g. first segment buildings and then classify attributes about each segmented building like its construction status or roof material).

Segmentation track participants must also submit at least once to the Responsible AI track to qualify for $12,000 in segmentation track prizes.

Responsible AI track$3,000 in prizes will be awarded for best ideas applying an ethical lens to the design and use of ML systems for disaster risk management.

ML can improve data applications in disaster risk management, especially when coupled with computer vision and geospatial technologies, by providing more accurate, faster, or lower-cost approaches to assessing risk. At the same time, we urgently need to develop a better understanding of the potential for negative or unintended consequences of their use. With growing attention given to questions of appropriate and ethical ML use for facial recognition, criminal justice, healthcare, and other domains, we have an immediate responsibility to elevate these questions for disaster risk.

Examples of potential harm that ML technologies present in this space include, but are not limited to:

  • Perpetuating and aggravating societal inequalities through the presence of biases throughout the machine learning development pipeline.
  • Aggravating privacy and security concerns in Fragility, Conflict and Violence settings through combination of previously distinct datasets.
  • Limiting opportunities for public participation in disaster risk management due to increased complexity of data products.
  • Reducing the role of expert judgement in data and modeling tasks and in turn increasing probability of error or misuse.
  • Inadequately communicating methods, results, or degrees of uncertainty, which increases the chance of misuse.

ML practitioners and data scientists are uniquely positioned to examine and influence the ethical implications of our work. We ask challenge participants to consider the applied ethical issues that arise in designing and using ML systems for disaster risk management. How might we improve the creation and application of ML to mitigate biases, promote fair and ethical use, inform decision-making with clarity, and make safeguards to protect users and end-beneficiaries?

This track’s submission format is flexible: participants can submit Jupyter notebooks, slides, blogs, essays, demos, product mockups, speculative fiction, art work, synthesis of research papers or original research, or whatever other format best suits you. Submissions will be evaluated by a panel of judges on thoughtfulness, relevance, innovation, and clarity.

What Comes Next

This challenge will produce new public goods that advance our state of practice in applying ML for understanding risk in urban Africa; this includes new ML performance benchmarks for building segmentation from aerial imagery in relevant geographies, top-performing solutions for mapping in African cities, and in-depth explorations of how we responsibly create and deploy AI systems for disaster risk management.

Prize-winning solutions will be published as open-source tools and knowledge and the challenge dataset will remain an open data resource for continued ML development and benchmarking. GFDRR will use lessons learned to inform policies and procurement strategies for using ML for urban mapping and planning.

Join the Challenge!

The competition is currently running until March 16, 2020. With one month to go, there is plenty of time to explore the data and participate in either tracks but don’t delay, join today at:

drivendata.org/competitions/60/building-segmentation-disaster-resilience