Improve the document about training data #1267

Technew94 · 2025-01-15T15:22:37Z

Dear developer,

After reading the documents, if I understand you correctly, training dataset is settled from "sitsdata' package right?
Then if I want to do the classification in e.g. Germany, then what kind of training dataset I should use? Should I lable the targets e.g. 'forest', 'pasture' and so on mannuly then create a new training dataset by myself?
If you can add some paragraphy to explain this quesiton and add some examples about how to prepare a "self-prepared" training dataset would be very helpful for beginers like me.

This may be a silly suggestion, but thank you for your time and patience.

All the best

gilbertocamara · 2025-01-16T12:28:57Z

Dear @Technew94 Many thanks for your suggestion. It is an important point that needs to be better addressed in the documentation. In fact, users can enter their data in sits for any part of the globe. I will give you a simple example below:

# select a study area 
cube_cloud <- sits_cube(
      source = "MPC",  #could be CDSE, AWS or many other cloud services supported by sits
      collection = "SENTINEL-2-L2A"  # use sits_list_collections() to see which collections are supported
      roi = c(lon_min = ..., lat_min = ..., lon_max = ..., lat_max = ...), # region of interest - can also be a MGRS tile
     bands = c("....") # put the bands you want 
     start_date = ....   # initial date of your data series
    end_date = ....     # final date of your series
)

# regularize the data cube
cube_reg <- sits_regularize(
        cube = cube_cloud,
        res = 10,    # in meters
        period = "P1M", # monthly data is one option see documentation for more
        output_dir = <where the regular data cube will be stored>
)
# get the time series 
# suppose you have a shapefile with points where labels are informed in column "LABEL"
time_series_data <- sits_get_data(
       cube = cube_reg,
       samples = <shapefile>,
      label_attr =  "LABEL"
)

The above code will allow you to create data cubes for Germany and retrieve a time series from a point shapefile. All of this is described in the documentation. However, I fully agree the docs can be improved. We are working hard in doing so.

Technew94 · 2025-01-16T12:43:11Z

Dear gilberto,
Thanks for your detailed explaination here. If I understand you correctly, when I want to do the classification from 20190101 to 20210101, then I can use a shapefile with points from 20190102 or even other time inside the timeseries to do classification, am I right? If so, people can create a shapefile from any tile as it is inside the time series.
If possible, in the document could you please add a simple shapefile in the section of "self-prepared" part? It can be an example for beginers like me.

Thanks again!

gilbertocamara · 2025-01-16T13:48:59Z

Dear @Technew94 thanks for the nice words!! Usually, shapefiles have no temporal information. So you can use a shapefile from any date to get training data from the cube. Of course, the model will perform better if the data collection that produced the shapefile occurs between the start and end dates of the data cube.

Please note that sits uses the convention YYYY-MM-DD for dates, as in "2024-02-03".

Technew94 added the improvement label Jan 15, 2025

gilbertocamara self-assigned this Jan 16, 2025

gilbertocamara added this to sits-management Jan 16, 2025

gilbertocamara added this to the version 1.5.2 milestone Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the document about training data #1267

Improve the document about training data #1267

Technew94 commented Jan 15, 2025

gilbertocamara commented Jan 16, 2025

Technew94 commented Jan 16, 2025

gilbertocamara commented Jan 16, 2025

Improve the document about training data #1267

Improve the document about training data #1267

Comments

Technew94 commented Jan 15, 2025

gilbertocamara commented Jan 16, 2025

Technew94 commented Jan 16, 2025

gilbertocamara commented Jan 16, 2025