Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valid_percent in utils/get_array_statistics not correctly calculated, when using feature coverage array #774

Open
MarcelCode opened this issue Jan 2, 2025 · 1 comment

Comments

@MarcelCode
Copy link
Contributor

MarcelCode commented Jan 2, 2025

If I calculate the coverage based on a geometry and hand it over to the get_array_statistics method, I would expect that the valid_percent is calculated based on the coverage array instead of the bounding box of the geometry.

example_geometry

In this example valid_pixels is correctly calculated, but valid_percent is wrong:

  • 2 valid pixels, because one pixel is nodata.
  • As the geometry includes three pixels, I would expect that valid_percent needs to be 66.67 %, instead it is 50 %.

Here is a full code running example:

from rio_tiler.io import Reader

def calculate_statistics(filename: str, shape: dict) -> dict:
    with Reader(filename) as src:
        data = src.feature(shape)

        coverage_array = data.get_coverage_array(shape)

        return data.statistics(coverage=coverage_array)

if __name__ == "__main__":
    file = "https://rastless-tests.s3.eu-central-1.amazonaws.com/TUR_us-newyork_013032_EOMAP_20190424_153304_LSAT8_m0030_32bit.tif"

    aoi = {"type": "Polygon", "coordinates": [
        [[-74.046461218997848, 40.63946290226923], [-74.046464353686744, 40.638997869937697],
         [-74.045818607775814, 40.638991923211712], [-74.045817040431388, 40.639200058305974],
         [-74.046160288864613, 40.639204815671974], [-74.046161856209054, 40.63946290226923],
         [-74.046461218997848, 40.63946290226923]]]}


    statistics = calculate_statistics(file, aoi)

    valid_pixels = statistics["b1"]["valid_pixels"]
    valid_percent = statistics["b1"]["valid_percent"]

    print(f"Valid pixels: {valid_pixels}. Expected value: 2.0")
    print(f"Valid percent: {valid_percent} %. Expected value: 66.67 %")

From what I see in the code this line seems to be wrong, as it always calculates the valid_percent based on the full array size:
utils.py/get_array_statistics line 136

valid_percent = round((valid_pixels / data[b].size) * 100, 2)

My suggestion to change it to:

valid_percent = round((valid_pixels / np.count_nonzero(coverage)) * 100, 2)

Even if no coverage is handed over by the user the calculation is correct, because then it is created inside the method.

I would be happy to create a PR if you agree.

@vincentsarago
Copy link
Member

thanks @MarcelCode I think the change proposed makes sense 🙏

happy to review a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants