We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于第九章这一部分: 为了简化数据处理,这里我们并没有将 [CLS]、[SEP]、[PAD] 等特殊 token 对应的标签设为 -100,而是维持原始的 0 值,然后在计算损失时借助 Attention Mask 来排除填充位置。
attention mask对于cls的位置是1。“active_loss = attention_mask.view(-1) == 1”会包括cls。是否需要mask掉?
The text was updated successfully, but these errors were encountered:
你好,active_loss = attention_mask.view(-1) == 1 实际上不止包含了 [CLS],还包含了 [SEP],因此这两个 token 的预测值也会参与计算。
active_loss = attention_mask.view(-1) == 1
[CLS]
[SEP]
考虑到在训练集中所有的 [CLS] 和 [SEP] 对应的标签都为 “O”(非实体),因此模型很容易就会捕获到这种关联,即使它们参与计算也不会有什么影响。
当然,如果你能在计算损失时将他们排除会更好。
Sorry, something went wrong.
No branches or pull requests
关于第九章这一部分:
为了简化数据处理,这里我们并没有将 [CLS]、[SEP]、[PAD] 等特殊 token 对应的标签设为 -100,而是维持原始的 0 值,然后在计算损失时借助 Attention Mask 来排除填充位置。
attention mask对于cls的位置是1。“active_loss = attention_mask.view(-1) == 1”会包括cls。是否需要mask掉?
The text was updated successfully, but these errors were encountered: