Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed dataset removal code, and upgraded sampler #10

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Jumperkables
Copy link

@Jumperkables Jumperkables commented Jun 26, 2020

Pull request as dicussed before:
Further note:

sampler = ParameterSampler(parameter_space, min(1000, args.number_of_simulations))

(should help with memory)

The sampler is also properly random in this state

Vanishing balls

I took a look at the ball-disappearing bug , it was literally just too high gravity as suspected, cranked it waaaaay down and tested it and it works fine. Generating a large dataset now

@ghomasHudson
Copy link
Contributor

Looks good.

Why do we need a min of 1000?

Also did you mean to change num_timesteps_per_simulation to 60?

@Jumperkables
Copy link
Author

Why min 1000:

Arbitrary choice, could be removed. Memory for storing huge parameter space could be big so i just put a min value in there

Dropping to 60

We process these video clips 6 at a time currently. There are 2 thoughts
-We want to reduce the number of training samples that are the same right? Well having more training samples from the same video would give more examples that are exactly the same. Hence reduce the number
-However, later on if we want to do longer term dependencies we'll need more frames per example, i.e. 10-1. 20-1, 20-5 etc.. For now it makes sense for it to be divisible by 6, so i thought 60 was a good compromise

@ghomasHudson
Copy link
Contributor

ghomasHudson commented Jun 26, 2020

Min 1000

Is there any difference between how things are sampled within the 1000 and in the next 1000? Is there some internal state that resets?

Also should probs make it a constant variable (See Magic Number antipattern). Or an argument?

Also you've written the same line list(ParameterSampler(parameter_space, min(1000, args.number_of_simulations))) twice (in 2 different ways!). This can be easily simplified to:

SAMPLER_CHUNK = 1000
sampler = None
while simulation_num < args.number_of_simulations:
    if(sampler == None or sampler_idx == min(SAMPLER_CHUNK, args.number_of_simulations)):
        sampler_idx = 0
        sampler = list(ParameterSampler(parameter_space, min(SAMPLER_CHUNK, args.number_of_simulations)))
    params = sampler[sampler_idx]
    sampler_idx += 1

Dropping to 60

We're not concerned about duplicated frames, only duplicated initial settings.
I guess the advantage of 100 is that it's easy to justify (nice round number). If you wanted to only use the first 60 frames of this then so be it (I do realise this will make the generation run loaaaaads slower)

I think the solution is to add it as an argument:

parser.add_argument('--num_timesteps_per_simulation', metavar='TIMESTEPS_PER_SIMULATION', type=int,
                    help='Number of frames to generate for each simulation', )

@Jumperkables
Copy link
Author

"Is there any difference between how things are sampled within the 1000 and in the next 1000? Is there some internal state that resets?"

It appears so yeah, ive tested it and the docs imply it, tho their wording is a bit convoluted

"We're not concerned about duplicated frames, only duplicated initial settings."

Pretty much the same thing right? Simulations that are similar in any way. Im fine with your changes. That looks tidier, as you will!

@ghomasHudson
Copy link
Contributor

ghomasHudson commented Jun 27, 2020

Pretty much the same thing right? Simulations that are similar in any way

Well if we were concerned about this we wouldn't allow any simulations which went through the same point (e.g. any simulations where the ball passed through the center point somewhere in the simulation would be duplicates?). This is definitely not the same thing.

@Jumperkables
Copy link
Author

Apologies delayed response was dealing with rebuttal

No cos its all about batches of 6 frames being as different from eachother as possible right?

@ghomasHudson
Copy link
Contributor

ghomasHudson commented Jun 30, 2020

ball_problem

Well see this example. Clearly both simulations are vastly different so should both be included in our dataset. Yet it just happens that the second frame is identical.

The aim of the task is to require knowledge of multiple frames to predict future frames. So what makes a simulation unique is clearly the pattern of multiple frames not just a single frame.

@Jumperkables
Copy link
Author

Yes okay, saying they were pretty much the same thing is careless, but asides from the point

Having more video clips that are shorter will give more variety in initial conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants