Skip to content

Visual-modelling/Admin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Visual-modelling

Repo to store high-level issues

See the Project Board.

Meeting Notes: 12th March 2020

Long Term Dependencies

  • Tom mentions 4D Convolutions

Psychological Motivation

  • (See wikipedia page on Smooth Pursuit for psychological motivation)

Initial Convolutional Model Idea

  • 3D convolution across a small number of frames
    • e.g. 5 frames, therefore 3x3x5 kernel size for 1st layer
  • Stride 2 to downsample the image progressively
  • Upsample the image back to 64x64 to produce each pixel
  • May have issues extracting features
    • not able to produce a frame / pixel embedding
  • Noura suggests that simple next frame prediction is a heavily studied task, so this does not add anything new. It must show that useful features can be extracted to add novel contributions
  • 4D Convolutions: incorporate long term dependencies via a 4th Convolutional dimension

Unidirectional or Masking Task

  • The task itself of frame prediction is more useful than next work prediction, so a left-right transformer could be more useful vs a generative model using masking

Action Points

Meeting Notes: 5th March 2020

Dataset

  • Hudson has implemented a basic 2d bouncing circle.

  • Testing if the model is capable of self learning conservation of momentum and law of reflection by varying the following (as initial PoC):

    • Varying circle size
    • Varying initial velocity and position
      • Constrain the initial velocity to be no more than half the distance between the initial position and the boundary
  • This will be tested on CNN and Transformer based models to begin with.

Transformer Model

  • Look into potential Pixel Embedding Space.
  • Will need for each pixel a positional encoding, and a temporal encoding.
  • Positional and temporal can be combined into the same 3D encoding for 2D videos and 4D encoding for 3D videos
  • Need to read up on how BERT treat embeddings encodings (relative or not)

Long term dependencies will be a problem in the future, but not within the scope of this initial work - as we are only modeling physical laws which do not require long-term observational memory.

Action Points

  • Hudson will incorporate the parameter space search (using the variables described above) to generate the dataset
  • Dean will look at whats required for the transformer to do the video prediction task for a sequence of 2D images (positional/temporal encodings + pixel embedding space)
  • Zheming will look at the latest techniques used in Vision for video prediction and add to literature review.

Meeting Notes: 9th April 2020

  • Tom W - Fixed bug for model (was treating images as binary)
  • Tom H - Added masking to dataset

Todo

  • Continue modelling (Some results by next week)

About

Repo to store high-level issues

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published