为什么生成模型中需要输入BOS和EOS等特殊标志

为什么加BOS/EOS

Transformers库提供的预训练模型

以Bart模型为例，源码的forward中有这样的代码：

if labels in not None:
    if decoder_input_ids is None:
        decoder_input_ids = shift_tokens_right(
            labels,self.config.pad_token_ids,self.config.decoder_start_token_id
        )

查看shift_tokens_right的源码，其实现如下：

def shift_tokens_right(input_ids:torch.Tensor,pad_token_id:int,decoder_start_token_id:int):
    shifted_input_ids = input_ids.new_zeros(input_ids.shape)
    shifted_input_ids[:,1:] = input_ids[:,:-1].clone()
    shifted_input_ids[:,0] = decoder_start_token_id
    assert pad_token_id is not None
    shifted_input_ids.masked_fill_(shifted_input_ids == -100, pad_token_id)

可以发现模型已经将decoder_start_token_id添加到数据当中了。