Files
L1B3RT4S/CLOSEDAI.mkd
2024-09-15 23:02:12 -04:00

580 B

Special Tokens

  1. <SOS> (Start of Sequence): Marks the beginning of a sequence for the model to start processing.
  2. <EOS> (End of Sequence): Tells the model when to stop generating text or processing.
  3. <PAD> (Padding Token): Pads sequences to the same length for batch processing.
  4. <UNK> (Unknown Token): Represents words not in the model's vocabulary.
  5. <MASK> (Mask Token): Used in tasks like predicting missing words in masked language models.
  6. <SEP> (Separator Token): Separates different segments in input, like questions from context.