Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outfilling Dialog #9363

Open
jkmusyoka opened this issue Jan 15, 2025 · 13 comments · May be fixed by #9381
Open

Outfilling Dialog #9363

jkmusyoka opened this issue Jan 15, 2025 · 13 comments · May be fixed by #9381
Labels

Comments

@jkmusyoka
Copy link
Contributor

@N-thony this is mainly for @Vitalis95.

As part of our ePICSA work, we would like to construct a dialog to simplify the outfilling of station data using satellite estimates mainly for our current work in Zambia.

Below is the R script which we recently used in Zambia perform some outfilling. The outfilling uses @lilyclements outfillingR package.
library(outfillingR) zambia_data <- data_book$get_data_frame(data_name="zambia_data") #perform outfilling infilled_data <- do_infilling( data = zambia_data, station = "station", date = "date", rainfall = "rainfall", rfe = "tamsat", lon = "longitude", lat = "latitude", station_to_exclude = "STATION A", rainfall_estimate_column = "tamsat", custom_bins = c(1, 5, 10, 15, 20), count_filter = 5, min_rainy_days_threshold = 30, target_months = c(5, 6, 7, 8, 9), distribution_flag = "gamma", markovflag = TRUE ) #display resulting dataframe data_book$import_data(data_tables=list(infilled_data=infilled_data))

See an initial design of the dialog below - largely based on the arguments of the do_infilling function in the script. I think this should be enough to get you @Vitalis95 started. Let me know any questions you may have.

Outfilling Dialog

@lilyclements the script currently results in a new dataframe with the outfilled column. Wouldn't it simpler and neat if the outfilled column is just added to the dataframe we are trying to infill? Is this possible?

@rdstern
Copy link
Collaborator

rdstern commented Jan 15, 2025

@Vitalis95 in the design above:

First you may wish to install the outfillingR package. Once installed you can use Help > R Package Documentation to get lots of details of the function.

Then in the diagram above:
a) The Data Frame top left is the usual data selector, so we can choose variables needed on the right.
b) The capital letters are for clarity for you. In the dialog they should be our usual Station: etc
c) The controls on the right are just the normal single receivers.
d) On the left, below the data selector Bins: will be a drop down control - same control as in Climatic > PICSA > Crops for Planting Date etc - with the default values above.
e) Count: and Days could be up-down controls with those defaults and allowing integer values, maybe from 1 upwards, but waiting for Emily to reply.
f) For the Omit Months have a checkbox, Maybe labelled Dry Month(s). If checked there is a button Omit Months. Then I hope it will be easy to copy the Omit Months sub-dialog from the Climatic > Check Data > QC Rainfall dialog.
g) The Distribution and also Markov TRUE/FALSE are presumable simple drop down. We have an example in the Define New Filter drop down.
h) Then (not given above) a Checkbox, default unchecked labelled Mode Options. Then an Options button, disabled initially.
i) Then (omitted above) Store Result label and a default name Estimated, that goes up by 1 each new variable. We assume @lilyclements will be changing her function to write a variable.
j) then the usual bottom buttons.
k) Then the R-code will call her function and add the column she produces to the data frame.
l) Then we need to de cide where it goes. Maybe in Climatic > Compare, but I think Climatic > Model, maybe first one there, and just called Outfilling.

@Vitalis95
Copy link
Contributor

@jkmusyoka @lilyclements ,do we need to add station_to_exclude as a default argument? The function won't run without it.

@Vitalis95
Copy link
Contributor

@lilyclements , concerning the target_months argument in the function, it expects a numeric vector specifying the months. Roger suggested copying the Omit Months sub-dialog from the Climatic > Check Data > QC Rainfall dialog but this produces an object as shown below;



filter_months <- instat_calculation$new(type="filter", function_exp="!month_abbr %in% c('Jan','Feb','Mar')", calculated_from=list("zambia_data"="month_abbr"))

@Vitalis95 Vitalis95 linked a pull request Jan 22, 2025 that will close this issue
@lilyclements
Copy link
Contributor

Summarising @rdstern's requests for changes in the R code:

  1. We want to use "our" version of the Omit Months (see the Rainfall QC dialog)
  2. What did Emily say about the Station to Exclude control. Can we omit it? (We have filters in R-Instat)
  3. Need a Store Result control at the bottom- the same as in the Calculator and other dialogs. It currently makes a new data frame, despite just producing a single column of the same length. It will be so much easier to use when it just adds to the existing dataframe.
  4. Do we need a More button towards a sub-dialog that facilitates changing the other arguments, or can that wait?
  5. Have an option of a specified random number seed. The whole thing is comparisons, and this includes comparisons of the different methods - which should use the same sequences.

Just wanted to have this written together somewhere. I can make the changes in the R package

@rdstern
Copy link
Collaborator

rdstern commented Jan 23, 2025

Just to respond to the items above:

  1. For now I suggest we stick with Emily's Omit Months, which Vitalis has already included. I think - seeing James' sample data that having it as now is fine and also permits the drop down to give different options.
  2. I wasn't clear on the response from Emily. If her Station to exclude is different from a filter, then we may need to change from the current dialog, which only allows a maximum of 1 station to be excluded. (Note if her exclusion just leaves the station(s) out totally, then it is the same as a filter. It may be that she excludes from part of the analysis, e.g. the adjustment phase, while still including it, in the outfilling. That would be different and then we should allow both.)
  3. Yes please.
  4. Id like to check with Emily, and (at the same time) press her again on other resonable values for the bins, count and days.
  5. Yes please. I hope a single number, in an up/down control perhaps from 1 upwards, with enough space for (say) 1 million. Can type in and default is perhaps 099. I wonder what happens now? So do we always want to specify the initial seed, or do we leave the default as now - where it is unknown to us. Then there is an Initial seed checkbox, with the up-down only visible when checked. Default maybe unchecked?

@lilyclements
Copy link
Contributor

@rdstern

  1. Great, thank you.
  2. In the R code it looks like it is different to a filter. The station to exclude is in the calibration part, when setting monthly parameters and bits, but when generating the values it does that to the full data.
  3. I'm on this now. I've given this as an option to return as a data frame or as a numerical variable. By default it will return as a numerical variable. @Vitalis95 can you change the R code to say that the resulting saved item is now a new column not a new data frame. You will need to update the R function by devtools::install_github("IDEMSInternational/OutfillingR")
  4. Great, let's set up a meeting with her to discuss.
  5. I was thinking we have it with an "Initial seed" checkbox, and that gives the up-down. Otherwise, no seed runs. What do you think?

@rdstern
Copy link
Collaborator

rdstern commented Jan 23, 2025

@lilyclements

On 2, in the function is it for a maximum of 1 station, or can we have more in your Exclude. The current dialog is just for 0 or 1 station?
On 5 I completely agree. We have a checkbox, default unchecked, and no seed then runs.

@lilyclements
Copy link
Contributor

lilyclements commented Jan 23, 2025

@rdstern It can now have multiple stations excluded. Before you pointed that out, it could only have one excluded (so you need to redownload the package)

@rdstern
Copy link
Collaborator

rdstern commented Jan 23, 2025

@lilyclements I see the new version can also output to a variable.
I assume you will also soon include the new random seed, unless that is already there?

My concern now that we can exclude many stations is on our routine use of this function, if it is the only one we have.

Let's take Eastern Province, where there are 5 main stations, then over 100 volunteer stations, and quite some automatic stations. So let's consider the work in 2 stages:
a) First we just use the 5 main stations. We "pretend" sections are missing, e.g. to 2010 and then estimate them. Our aim is the find which satellite algorithm is to be preferred, and also how the tamsat (say) data are to be adjusted in the best way.
b) I presume the optimal adjustment will come from a run of the code with all the data from the 5 stations?
c) Then we want to apply the optimal solution to (say) 30 stations, with short records in Eastern Province? So we now have 35 stations, with 5 used to get the adjustments again, and then they are applied to the other stations. I would like to assume the adjustment is the same, if the randome number seed is the same? I wonder if that is true.

@lilyclements
Copy link
Contributor

@rdstern setting a seed is there under the parameters “set_seed” (default NULL)

I am unsure if I 100% follow. The monthly parameter (adjustment?) values are the same for all rows (so across the stations) because of how the function is set up, I believe. In the code, the “monthly parameter” values are all set, and then there is a loop through each row of the data using these values to generate new values.

I did check on the generation of random values (e.g. a random binomial distributed value). Setting a single seed does mean that when we randomly generate a variable it does alter for each iteration of the loop (so each row does get a randomly distributed value).
(And between two function calls, it gives the same results - I.e. if I ran do_outfilling twice with the same seed in them then the resulting generated rainfall values are the same).

@Vitalis95
Copy link
Contributor

@rdstern It can now have multiple stations excluded. Before you pointed that out, it could only have one excluded (so you need to redownload the package)

@rdstern , how can we implement this in the dialog?

@rdstern
Copy link
Collaborator

rdstern commented Jan 24, 2025

@Vitalis95 here are my suggestions for minor revisions to the current outfilling dialog. Here it is:

Image

a) Move the Station to Exclude to the left hand side and add a checkbox, default unchecked. Call it Station(s) to Exclude to prepare for a later change, but keep the control as it is for now. (The other changes should be quick, and it will be good to get it working with them. Initially we don't need the stations to exclude option, so leave those changes to next week.)
b) Move the Dry Months control to become the second on the left (with a checkbox) - also as now, with the default unchecked.
c) Add a new Random Seed checkbox under that - so as the third. Default also unchecked. If checked it reveals quite a wide up-down which can give positive integers, and default 999. Typing in, is allowed and 1000000 could be visible. (I understand this is a new argument in the revised @lilyclements code we need to download.)
d) Then have Bins and Count on the left as well. Left-justify the labels and also keep the second parts of the control left justified - as you have now.
e) Move Days, Distribution and Markov to the right-hand side. The 2 drop downs for Distribution and Markov can be narrower as there are no more entries to come there.
f) Add the Store Result Checkbox - with the Position button - same as for the calculator - at the bottom left. Default is checked, i.e. to add a column to the existing data frame

Looking forward!

@rdstern
Copy link
Collaborator

rdstern commented Jan 24, 2025

Once this is all working, the item that remains is to make the Stations to Exclude a control that can exclude multiple stations that you select. It needs a checkbox for each station and we have that elsewhere. It is the control in the Filter from Factors sub-dialog, and also in the filter sub-dialog when you choose a factor to filter on.
So I suggest it changes into a button and calls that same control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants