Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIA format that supports groundtruth #50

Open
Villux opened this issue May 5, 2021 · 7 comments
Open

VIA format that supports groundtruth #50

Villux opened this issue May 5, 2021 · 7 comments
Assignees

Comments

@Villux
Copy link
Contributor

Villux commented May 5, 2021

For ILAS we have:

"regions": [{
"region_attributes": {
"region_key": "region_value"
},
"shape_attributes": {
"coords": [x1, y1, x2, y2, x3, y3, x4, y4],
"name": "shape_type"
}
}],
}]
}

I think the reason is result format that has regions.

In Task processor we include class_attributes for GT tasks. Logic is here

Could we separate results and groundtruth to a different formats? What ideas you have?

@Villux Villux added this to the HMT-NEXT-SPRINT milestone May 5, 2021
@Villux
Copy link
Contributor Author

Villux commented May 5, 2021

My idea of GT is:

{
    "datapoints": [
        {
            "task_uri": "http://azureblob:10000/testbucket/-C7idm5tNUw/frame_0074_74.00.jpg",
            "class_attributes": {
                "0": {
                    "class_name": "0", // will be mapped to entity name in hcaptcha gt format, required field
                    "confidence": null, // legacy field from ilb, but might be useful later to adjust how much to inject
                    "entity_type": "car", // required field in hcaptcha gt format
                    "entity_coords": [1,2,3,4], // area object, required field in hcaptcha gt format
                    "n_answers": 2 // how many GT answers to inject => higher value the more client is punished for not answering correctly
                }
            },
            "metadata": {
                "original": {
                    "public_url": "http://azureblob:10000/testbucket/-C7idm5tNUw/frame_0074_74.00.jpg" // model training url
                }
            }
        }
    ]
}

@Villux
Copy link
Contributor Author

Villux commented May 5, 2021

hCaptcha expects to see this format when job is added

{
    "https://temple-gates.hcaptcha.com/v_HN-3LaZVuCs.mp4.12000.jpg": [
        [
            {
                "entity_type": "car",
                "entity_coords": [
                    47,
                    52,
                    269,
                    52,
                    269,
                    159,
                    47,
                    159
                ],
                "entity_name": 0
            }
        ]
    ],
    "https://temple-gates.hcaptcha.com/v_m7T6SrshoLk.mp4.68000.jpg": [
        [
            {
                "entity_type": "car",
                "entity_coords": [
                    145,
                    91,
                    514,
                    91,
                    514,
                    266,
                    145,
                    266
                ],
                "entity_name": 0
            }
        ]
    ]
}

@gaieges
Copy link
Contributor

gaieges commented May 5, 2021

Not sure what the issue is here?

@Villux
Copy link
Contributor Author

Villux commented May 6, 2021

In short:

  1. VIA format we have in the basemodels is created for what purpose? For me it looks like a mixture of GT and results which doesn't make sense.
  2. Task processor expect all GT information to be inside class_attributes which is not the case with ILAS data in this format. Regions are a separate key and there is no structure that would express it's part of GT data.

I suggest that we would do it like this #50 (comment)

@e271828-
Copy link
Contributor

e271828- commented May 6, 2021

In short:

  1. VIA format we have in the basemodels is created for what purpose? For me it looks like a mixture of GT and results which doesn't make sense.
  2. Task processor expect all GT information to be inside class_attributes which is not the case with ILAS data in this format. Regions are a separate key and there is no structure that would express it's part of GT data.

I suggest that we would do it like this #50 (comment)

The idea is requesters can have partial ground truth, and it is useful to associate that with a task. In principle it may be easiest to pack that in alongside the other info for the task.

However, none of these formats are optimal based on what we know from running them for a few years, and could be more carefully designed in the next iteration.

@Villux
Copy link
Contributor Author

Villux commented May 6, 2021

I'm not sure if I understand partial GT in this context. Also packing alongside the other info is a bit unclear.

I'm 100% with out on "not designed carefully". Maybe we could put format planning on next sprint's schedule? I would like to avoid hacking something now and then changing it to all places later.

For me having GT under a single key like class_attributes would make the most sense.

@gaieges
Copy link
Contributor

gaieges commented Jul 9, 2021

I'm not sure what the purpose of splitting groundtruth and results - they're the same in concept, we can use (likely spec/create our own) format that supports various types of answer types and lets us add more down the road.

@gaieges gaieges removed this from the HMT-NEXT-SPRINT milestone Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants