Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding VariableType to metadata #127

Closed
bryanrcarlson opened this issue Aug 31, 2022 · 1 comment · Fixed by #132
Closed

Consider adding VariableType to metadata #127

bryanrcarlson opened this issue Aug 31, 2022 · 1 comment · Fixed by #132
Milestone

Comments

@bryanrcarlson
Copy link
Contributor

The Variable Catalog tends to get filled up with descriptive-type variables (site location, treatment ID, block ID, latitude, longitude, etc.) where most researchers are likely looking for measurement-type variables (stand count, biomass, percent carbon, volumetric water content). We should consider providing a way to differentiate these variable types for better filtering.

This can easily get out of hand (see here and here).

Best option is to be consistent with other such filters (e.g. zone, processing, quality control) and allow variable types to be defined in the app-config.json file.

We also will need to provide default values. Maybe follow statistics and go with the upper level: Quantitative/Numeric, Qualitative/Categorical? Or go with an extra step and do: Discrete, Continuous, Nominal, Ordinal.

Keeping things simple (for speed-to-metadata): Numeric, Categorical.

@bryanrcarlson
Copy link
Contributor Author

Take a cue from dimensional modeling. A "dimension" describes "who, what, where, when, why, and how". A "metric" is a quantitative measurement.

I think it's safe to say the context of the dataset can determine what is a dimension vs a metric -- so we have some leeway here. For example, measuring crop height, "height" will be a metric. But if we are describing the height of a sensor then "height" will be a dimension.

Nominal and ordinal variables pose some confusion. There are arguments that ordinal vars can be considered continuous vars (https://www.frontiersin.org/articles/10.3389/feduc.2020.589965/full). So maybe we can handwave that. (although there are best practices that state "metrics" should allow the calculation of means, min, max, etc.)

I'm not sure about nominal though. Should we always treat those as dimensions? If so, something like a management dataset, which is mostly nominal values, will pose an issue. Would we want a "drill type" or "tractorId" grouped with "plotId", "NearestTown", and other descriptive variables? Maybe this is fine?

@bryanrcarlson bryanrcarlson added this to the 0.4 milestone Aug 24, 2023
@bryanrcarlson bryanrcarlson modified the milestones: 0.4, 0.3 Sep 1, 2023
@bryanrcarlson bryanrcarlson linked a pull request Sep 1, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant