To Compute Satellite Image Statistics Using Pandas in Google Colab

TIJ Tech Private Limited
5 min readDec 16, 2020

--

For more details and information, you can watch the YouTube video on how to Compute satellite Images statistics using pandas in google colab.

Video URL : https://youtu.be/xPTAl_qf864

Overview

In this tutorial, we will implement on how to compute the satellite Images statistics using pandas in google colab and derive the zonal statistics from the Sentinel 2 images which we will merge with our pandas data frame through a walkthrough on how to calculate the NDWI water index for the flooded areas.

Satellite data are dense when used in machine learning artificial intelligence and the data are stored as values using the cells. But we want only a summary of the satellite image known as raster image which will be converted into a tabular format in CSV or Pandas data frame. Satellite images are pixel wised data called Rasters.

Raster images mainly consist of satellite images and Rasters which consist of a matrix of cells and rows. Each cell or row holds information about the location such as elevation, temperature and vegetation. This process of deriving table outputs called Summary statistics from raster images is called Zonal Statistics.

Data Exploration

Firstly, install and import the libraries.

!apt install gdal-bin python-gdal python3-gdal
!pip install rasterio
!apt install python3-rtree
!pip install git+git://github.com/geopandas/geopandas.git
!pip install descartes
!pip install mapclassify
!pip install sentinelsat
import pandas as pd
import
numpy as np
import
geopandas as gpd
import
rasterio as rio
from
rasterio.plot import show
from rasterio.mask import mask
import matplotlib.pyplot as plt
import
seaborn as sns

Gdal is a popular and powerful geospatial data processing library and is available as a set of command_line utilities (ready to use binaries files).

Mapclassify is an open-source python library for Choropleth map classification. It is part of a refactoring of PySAL.

Sentinelsat a python library which makes searching, retrieving and downloading Sentinel satellite images easy. So let us start installing sentinelsat through pip.

So next, we will be downloading the files from the web and unzipping it.

!wget https://www.dropbox.com/s/gf51dybqbujyjb2/Data.zip
!unzip Data.zip

Now with Rasterio library, we can read the different bands of Sentinel 2 Images. In this case, we read bands 8 as NIR, 4 as Red, 3 as Green and 2 as Blue.

b8 = rio.open("/content/Data/20191101/B08-20191101.tif")
b4 = rio.open("/content/Data/20191101/B04-20191101.tif")
b3 = rio.open("/content/Data/20191101/B03-20191101.tif")
b2 = rio.open("/content/Data/20191101/B02-20191101.tif")

Next Let’s look at the width and height of the images. Just now, we will be only using b4, but we can check if all bands have the same weight and length or not.

b4.count, b4.width, b4.height

We can plot the data to see and explore the satellite images that we have. Here, we will be plotting only for Band 3.

fig, ax = plt.subplots(1, figsize=(12, 10))
show(b3, ax=ax)
plt.show()

The Sentinel 2 image of the area for only Band 3 is shown below.

Let’s also read the buildings table which we will use to store the statistical summaries derived from the satellite image.

Note that, we can use other polygons such as districts, rectangular grids instead of the building polygons. We will use Geopandas to read the data.

buildings = gpd.read_file("/content/Data/shapefiles/osm_buildings.shp")
buildings = buildings[["osm_id", "building", "geometry"]]
buildings.head()

And here are the first five rows of the table shown below. We have the Geometry column of each building, osm_id and building columns.

We can also plot the buildings on top of the Sentinel 2 images.

fig, ax = plt.subplots(figsize=(12, 10))
show(b4, ax=ax)
buildings.plot(ax=ax, color="white", alpha=.50)
plt.show();

The buildings are marked as white and over layered in the image as shown. The image shows the extent of the city i.e., residential areas with meandering Shabelle River.

Calculation of NDWI (Calculate Normalized Difference Water Index)

To Calculate the NDWI values from Sentinel 2 images, we use the formula:

(Band3 — Band8)/(Band3 + Band8)

So, Let’s calculate using this formula in Rasterio.

green = b3.read()
nir = b8.read()
ndwi = (nir.astype(float)-green.astype(float))/(nir+green)

The NDWI arrays can be plotted with Rasterio as shown below.

fig, ax = plt.subplots(1, figsize=(12, 10))
show(ndwi, ax=ax, cmap="coolwarm_r")
plt.show()

The NDWI plot clearly shows inundated areas near the Shabelle river in Blue color and the extent of flooding goes to residential areas.

We can save the NDWI arrays as a raster image so that we can use it later.

meta = b4.meta
meta.update(driver='GTiff')
meta.update(dtype=rio.float32)
with rio.open('NDWI.tif', 'w', **meta) as dst:
dst.write(ndwi.astype(rio.float32))
ndwi_raster = rio.open('NDWI.tif')

Now, that we have calculated the NDWI values, it is time to derive statistics from the NDWI raster image and merge to our buildings table. We use Rasterio mask functionality to get the cell values from the NDWI raster image.

The following is a small function that masks the cell values to our data frame table.

def derive_stats(geom, data=ndwi_raster, **mask_kw):
masked, mask_transform = mask(dataset=data, shapes=(geom,),
crop=True, all_touched=True, filled=True)
return masked

We can now derive the values that we want. Let’s say we are interested in getting the mean values of NDWI for each building. We create a column for that “mean_ndwi” and pass our function to apply the building’s geometry and also apply to mean from using Numpy.

buildings['mean_ndwi'] = buildings.geometry.apply(derive_stats).apply(np.mean)

Or get the maximum NDWI values for each building.

buildings['max_ndwi'] = buildings.geometry.apply(derive_stats).apply(np.max)
buildings.head()

Our table has two new columns i.e., mean_ndwi and max_ndwi, where we can store the mean and max NDWI values for each building.

Let’s create a figure and a set of subplots for the axis object by showing mean NDWI.

fig, ax = plt.subplots(figsize=(12,10))
buildings.plot(column="mean_ndwi", ax=ax, cmap="Reds_r", scheme='quantiles', legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x103508d10>

Then, Let’s create the seaborn displot figure for maximum NDWI.

sns.distplot(buildings.max_ndwi)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1f0f8c10>

Then, plot the figure of max_ndwi through ndwi raster of residential areas in dark-green color.

fig, ax = plt.subplots(figsize=(12, 10))
show(ndwi_raster, ax=ax)
buildings[buildings["max_ndwi"].between(0.4, 0.6)].plot(color= "darkgreen", ax=ax)
plt.show();

Lastly, we will plot the figure of max_ndwi through ndwi raster of residential areas in red color.

fig, ax = plt.subplots(figsize=(12, 10))
show(ndwi_raster, ax=ax)
buildings[buildings["max_ndwi"].between(0, 0.1)].plot(color="red", ax=ax)
plt.show();

--

--

No responses yet