Problem Definition


Map of Greenland with location of satellite images for training and test sets.
Map of Greenland with location of satellite images. Red frame highlights region of 2019 image data.
Every boreal summer, temperatures in Greenland melt the surface snow and form lakes on top of the Greenland ice sheet. With more than 20 years of increased summer temperatures, lakes have formed earlier and higher on the ice, with many lakes forming (and draining) at higher elevations than ever before. These lakes are highly-dynamic hydrologic features on the ice sheet, prone to disappearing rapidly when they drain through cracks in the ice, which in turn helps lubricate the bed and changes the dynamics of underlying ice. How these lakes affect ice dynamics, and how their behaviors are changing in a warming Arctic climate are both active areas of research. To do this research, we need to track the existence of supraglacial lakes and their behaviors throughout a melt season, and longitudinally between seasons.

Problem. We seek to automate the detection of surface lakes on the Greenland ice sheet from satellite images, identifying lakes as tagged polygons from a single image, and allowing scientists to easily track the behavior of lakes through repeated summer melt seasons.

Challenge. Supraglacial lakes can be optically complicated. They often are partially or fully ice-covered, can drain and refill in days (partially or fully), and are often filled with dust and debris. They are spectrally similar to “slush fields”, areas of blue-ish wet snow that do not yet have free-standing water and therefore are not classified as “lakes” outright. Unlike land-based lakes, in which water is largely separated from its bed, here the solid bed and the water share an intermingled relationship: water freezes solid onto the ice beneath it, and melts again into water when temperatures rise. Lastly, the presence of clouds, being largely the same color as underlying snow, complicates detection in their own right and makes identifying underlying features difficult. Satellite artifacts (seams between tiles, spectral noise, etc) are easy for humans to visually ignore, but complicate algorithmic detection of these features. However, using context clues and content knowledge in an image, typically a human eye can distinguish between a “lake” (dirty or not, ice-covered or not) and a “non-lake” (slush field, rivulet, empty ice, etc.), making lake detection a good candidate for machine learning, although this competition is algorithm-agnostic.

Input Data

We provide a set of four large multi-part satellite images, each composed of dozens of individual satellite tiles, covering several hundred thousand km2, and encompassing hundreds of lakes. The four images are from four different dates in the summer 2019 melt season, which saw a disproportionately large amount of melt and runoff. All the images cover the same two regions, in southwest and northeast Greenland (see image below), and contain many surface lakes and other hydrologic features.. Each image will have a corresponding geojson file (.json) as follows: (a) A file containing “regions” outlines over each satellite image. The regions have been identified as two types: “training” regions, where all available lakes have been hand-outlined to assist your algorithms in lake detection, and “test” regions, where your algorithm is tasked to outline lakes within the region. (b) A file containing the vector polygon lake outlines as hand-tagged by reviewers. The field named “region” identifies each training region in which it is contained, and the “image” tag identifies the corresponding image. Data will be made available through the contest website. Please refer to the Downloads section for data access.

June 3, 2019. June 19, 2019. July 31, 2019. August 25, 2019.
Satellite images of lakes in melt season of 2019, on June 3 (top), June 19, July 31, August 25 (bottom).

Solution Guidelines

Here are a few rules that will help guide lake detection (see example images below):

An example of satellite images without (top) and with annotations.
  • Lakes should be one contiguous region: two or more separate regions should be classified as two separate lakes.
  • Lakes will not have “holes” in them, but mayl contain any artifacts that are wholly outlined within them (such as floating icebergs). Lakes will be simple closed polygons. There should be no multi-part polygons.
  • Lakes may contain floating icebergs or frozen-over patches, but must be at least partially “open” with liquid water exposed to the surface. No completely-frozen-over lakes are to be identified (these are more prominent in early-season and late-season imagery).
  • “Slushy” areas of wet blue snow should not be classified as “lakes”. Lakes are defined by open water of some depth. This is one of the most difficult aspects of this project, since no clear definition exists between “slush field” and “lake”. Let the training polygons be a guide.
  • Polygons should measure more than 100,000 m2 (0.1 km2) area in the Psuedo-Mercator projection used in the images and JSON files. This is an area smaller than a rectangle measuring approximately 100x1000 m. The images contain many puddles and ponds of smaller areas that could feasibly be seen as “lakes”, but we are not classifying those.
  • We are not classifying “narrow streams”. For this exercise, a lake is considered a lake if its smallest center diameter is at least 10% or greater in length than its longest diameter. In other words, a lake should not be more than 10x longer than it is wide. Lakes may be irregularly shaped and contain “pinch points'', but not long, narrow streams.
  • Two Lakes connected by a narrow stream should be classified as separate lakes.

A tip: Lakes are located in depressions in the ice governed by underlying bedrock. Even though the ice flows downhill, lakes remain stationary. This means that from one image to the next, lakes might be highly spatially auto-correlated. They may signficantly change their size or shape, but do not change locations. (You are allowed to use other datasets, such as the ArcticDEM, if you think elevation data might help you solve this problem. If you use external datasets, you must provide a link to an online cloud drive where your processed datasets can be downloaded in full in order for the judges to reproduce your results.)