Summary of potential issues (20220608) #253

pfliu-nlp · 2022-06-09T01:10:30Z

1. cmrc2019

dataset = load_dataset("cmrc2019")
task_type: cloze-multiple-choice
example: load_dataset("gaokao2018_np1", "cloze-multiple-choice")

2. dureader_yesno

should answer be answers?
we should introduce context as a column
this is not a qa_extractive? it should be qa_multiple_choice or qa_bool?

@register_task(TaskType.qa_bool_dureader)
@dataclass
class QuestionAnsweringBoolDureader(QuestionAnswering):
    task: TaskType = TaskType.qa_bool_dureader
    question_column: str = "question"
    context_column: str = "documents"
    answers_column: str = "answers"
    
    answers: {"text": "xxx", "yesno_answer":"Yes"}

3. dureader_search

the task is qa_extractive while the context_column = "documents" is not a string

4. ckbqa

this dataset could be broken down to two tasks
- qa_open_domain: question_column, answers_column
- text_to_sql: question_column, sql_column

5. coqa

Similar to the above one

6. dureader_robust

the [answers:text, start] (

DataLab/datasets/dureader_robust/dureader_robust.py

Line 79 in 76548db

"answers": {

) should be sequence, here is one example

answers = {"text": answer_text, "answer_start": answer_start}
(1) answers = [{"text": answer_text, "answer_start": answer_start}]
(2) answers = {"text": [answer_text], "answer_start": [answer_start]}

7. ccpm

it seems that we can re-use the task QuestionAnsweringMultipleChoiceWithoutContext for this dataset

8. cail2019

Similar to the above one

ccks2019_fin

the event type should also be regarded as one input?

ccks2020_fin_ee

rethink if 'event_column' is the best name.

ccks2021_fin_ea

it seems that the schema of arguments is different from the above one, so we probably need to modify the task name a little bit
v.s 2020: define a new task schema for ccks2021_fin_ea?

ccks2021_fin_re

it seems that the schema of relation is pretty complicated, should we modify the task name of event_relation_extraction

The text was updated successfully, but these errors were encountered:

pfliu-nlp · 2022-06-09T02:48:42Z

STSB

should the label type of text_similarity be float?

cail2018

if any config name contains "space" we can replace it with "-" or "_"

ubuntu_dialogs_corpus

it seems we didn't define task_template for this task?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary of potential issues (20220608) #253

Summary of potential issues (20220608) #253

pfliu-nlp commented Jun 9, 2022 •

edited

Loading

pfliu-nlp commented Jun 9, 2022

Summary of potential issues (20220608) #253

Summary of potential issues (20220608) #253

Comments

pfliu-nlp commented Jun 9, 2022 • edited Loading

1. cmrc2019

2. dureader_yesno

3. dureader_search

4. ckbqa

5. coqa

6. dureader_robust

7. ccpm

8. cail2019

ccks2019_fin

ccks2020_fin_ee

ccks2021_fin_ea

ccks2021_fin_re

pfliu-nlp commented Jun 9, 2022

STSB

cail2018

ubuntu_dialogs_corpus

pfliu-nlp commented Jun 9, 2022 •

edited

Loading