1. Home
  2. red pajama

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

$ 6.50

4.8 (197) In stock

Together, the developer, claims it is the largest public dataset specifically for language model pre-training

AI releases RedPajama-Data-v2 dataset, Aleksa Gordić posted on the topic

Data science recent news

Benjamin Rogers on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training…

Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect

togethercomputer/RedPajama-Data-V2 · Open source community will forever be indebted to Together AI.

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

Benjamin Rogers on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training…

NLP recent news, page 7 of 30

NLP recent news, page 7 of 30

NLP recent news, page 7 of 30