VIA format that supports groundtruth #50

Villux · 2021-05-05T07:04:53Z

For ILAS we have:

Lines 20 to 30 in da99984

    
                   "regions": [{ 
        
                       "region_attributes": { 
        
                           "region_key": "region_value" 
        
                       }, 
        
                       "shape_attributes": { 
        
                           "coords": [x1, y1, x2, y2, x3, y3, x4, y4], 
        
                           "name": "shape_type" 
        
                       } 
        
                   }], 
        
               }] 
        
           }

I think the reason is result format that has regions.

In Task processor we include class_attributes for GT tasks. Logic is here

Could we separate results and groundtruth to a different formats? What ideas you have?

The text was updated successfully, but these errors were encountered:

Villux · 2021-05-05T07:05:36Z

My idea of GT is:

{
    "datapoints": [
        {
            "task_uri": "http://azureblob:10000/testbucket/-C7idm5tNUw/frame_0074_74.00.jpg",
            "class_attributes": {
                "0": {
                    "class_name": "0", // will be mapped to entity name in hcaptcha gt format, required field
                    "confidence": null, // legacy field from ilb, but might be useful later to adjust how much to inject
                    "entity_type": "car", // required field in hcaptcha gt format
                    "entity_coords": [1,2,3,4], // area object, required field in hcaptcha gt format
                    "n_answers": 2 // how many GT answers to inject => higher value the more client is punished for not answering correctly
                }
            },
            "metadata": {
                "original": {
                    "public_url": "http://azureblob:10000/testbucket/-C7idm5tNUw/frame_0074_74.00.jpg" // model training url
                }
            }
        }
    ]
}

Villux · 2021-05-05T07:06:45Z

hCaptcha expects to see this format when job is added

{
    "https://temple-gates.hcaptcha.com/v_HN-3LaZVuCs.mp4.12000.jpg": [
        [
            {
                "entity_type": "car",
                "entity_coords": [
                    47,
                    52,
                    269,
                    52,
                    269,
                    159,
                    47,
                    159
                ],
                "entity_name": 0
            }
        ]
    ],
    "https://temple-gates.hcaptcha.com/v_m7T6SrshoLk.mp4.68000.jpg": [
        [
            {
                "entity_type": "car",
                "entity_coords": [
                    145,
                    91,
                    514,
                    91,
                    514,
                    266,
                    145,
                    266
                ],
                "entity_name": 0
            }
        ]
    ]
}

gaieges · 2021-05-05T16:32:40Z

Not sure what the issue is here?

Villux · 2021-05-06T05:29:54Z

In short:

VIA format we have in the basemodels is created for what purpose? For me it looks like a mixture of GT and results which doesn't make sense.
Task processor expect all GT information to be inside class_attributes which is not the case with ILAS data in this format. Regions are a separate key and there is no structure that would express it's part of GT data.

I suggest that we would do it like this #50 (comment)

e271828- · 2021-05-06T05:48:14Z

In short:

VIA format we have in the basemodels is created for what purpose? For me it looks like a mixture of GT and results which doesn't make sense.

Task processor expect all GT information to be inside class_attributes which is not the case with ILAS data in this format. Regions are a separate key and there is no structure that would express it's part of GT data.

I suggest that we would do it like this #50 (comment)

The idea is requesters can have partial ground truth, and it is useful to associate that with a task. In principle it may be easiest to pack that in alongside the other info for the task.

However, none of these formats are optimal based on what we know from running them for a few years, and could be more carefully designed in the next iteration.

Villux · 2021-05-06T05:55:19Z

I'm not sure if I understand partial GT in this context. Also packing alongside the other info is a bit unclear.

I'm 100% with out on "not designed carefully". Maybe we could put format planning on next sprint's schedule? I would like to avoid hacking something now and then changing it to all places later.

For me having GT under a single key like class_attributes would make the most sense.

gaieges · 2021-07-09T21:07:00Z

I'm not sure what the purpose of splitting groundtruth and results - they're the same in concept, we can use (likely spec/create our own) format that supports various types of answer types and lets us add more down the road.

Villux added this to the HMT-NEXT-SPRINT milestone May 5, 2021

Villux assigned todicus and gaieges May 5, 2021

gaieges removed this from the HMT-NEXT-SPRINT milestone Jul 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VIA format that supports groundtruth #50

VIA format that supports groundtruth #50

Villux commented May 5, 2021

Villux commented May 5, 2021

Villux commented May 5, 2021

gaieges commented May 5, 2021

Villux commented May 6, 2021

e271828- commented May 6, 2021 •

edited

Loading

Villux commented May 6, 2021

gaieges commented Jul 9, 2021

VIA format that supports groundtruth #50

VIA format that supports groundtruth #50

Comments

Villux commented May 5, 2021

Villux commented May 5, 2021

Villux commented May 5, 2021

gaieges commented May 5, 2021

Villux commented May 6, 2021

e271828- commented May 6, 2021 • edited Loading

Villux commented May 6, 2021

gaieges commented Jul 9, 2021

e271828- commented May 6, 2021 •

edited

Loading