Let's talk about Satellite Imagery

October 21, 2021
written BY

Matthew Rozek

Satellite imagery is a rich data source that powers many applications, from advanced scientific research to consumer applications. The first one that comes to my mind is any mapping app or website. You can see colour satellite imagery over any country on Earth. You can zoom in and out as much as you want, and it can look beautiful.

You may not know it, but a considerable amount of work goes into processing the imagery to get to this point, whether the application is a map or an advanced AI product. As satellites orbit over the Earth and take photos, they capture raw data. From a machine learning pipeline point of view, there are still many processing steps that need to happen to get the data into a usable format, or 'product'.

There are distinct levels of processing (that I won't go into here) that categorise what has been done to imagery after it's captured onboard a satellite. For example, the satellite providers typically release processed data to correct for distortions from the camera angle or remove noise that comes in the form of reflections in the atmosphere. So even from this point, there is still a lot of work to do before it's usable.

This false-colour image shows Western Australia in 2013. It depicts the rich sediment and vegetation patterns of a tropical estuary.
This false-colour image, Western Australia in 2013. It depicts the rich sediment and vegetation patterns of a tropical estuary (source).

This processing results in imagery that:

  1. It still has clouds, i.e., you can't see what's on the ground
  2. It is available at irregular time intervals: new imagery can come in every 2-5 days (we can consider it random)
  3. Most often has missing data; the drift of the satellite orbit means it won't always fully capture the same area twice

Furthermore, the above issues, like the weather, change from location to location daily so that the data will be available at irregular intervals.

So how do we usually deal with irregular time-series data? The simplest way to solve this problem is to interpolate data points to form a regular time interval. However, what if we get new imagery one day, and half of the area covered is missing, and we want to input this into our model? What is the best way of filling in the empty values to use them in our models if we can't interpolate? Suppose we choose a time period for how often we run our models, i.e., every 16 days. How do we best utilise all of the available imagery (if we get new data every 2-5 days) to construct the data point that goes into our models?

That is the crux of the problem that arises from converting this processed but irregular data into regular data that models can safely ingest. After that, the next step is to analyse the image data through equations and models to output a derived metric or variable (which we internally call a 'product'). This step is roughly equivalent to feature engineering in machine learning. It creates defined, reliable values from the processed and transformed input data.

One famous example in satellite imagery is NDVI, which measures a plant's biomass and is based upon a ratio of the red and infrared light measured by the satellite. Once you've completed these steps and your satellite imagery pipeline and data are in an easily calculable form, you can use these 'products' as reliable inputs to other analyses.

While this is quite a general overview, this gives some insight into satellite imagery processing and how we use imagery at Agtuary to analyse agricultural land and how it changes over time.

Deciphering risk, unearthing opportunity in climate change and land management
Measure the drivers of risk, impact and opportunity in supply chains, ecosystems & economies across the planet.