Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

$ 8.99

4.7 (307) In stock

Together, the developer, claims it is the largest public dataset specifically for language model pre-training

Product & Engineering Archives - Pear VC

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models : r/LocalLLaMA

2311.17035] Scalable Extraction of Training Data from (Production) Language Models

RLHF: Reinforcement Learning from Human Feedback

Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect

RedPajama training progress at 440 billion tokens

Leaderboard: OpenAI's GPT-4 Has Lowest Hallucination Rate

NLP recent news, page 7 of 30

togethercomputer/RedPajama-Data-1T · Datasets at Hugging Face

Leaderboard: OpenAI's GPT-4 Has Lowest Hallucination Rate

Related products

What's in the RedPajama-Data-1T LLM training set

Llama Llama Red Pajama Kids Books Read Aloud

Lulu's Fancy Red Women Satin Pajama Set, Shorts and Top Set

HOT PILLXIOWGEWRH 601] Red Pajamas Sets Women Nightwear Pajamas

Llama Llama Red Pajama Book and Plush: Dewdney, Anna

Llama Llama Red Pajama: Dewdney, Anna, Dewdney, Anna

You may also like

Specialty Fabric

21 White Slip Dresses Every Minimalist Bride Will Love - Love & Lavender

Disney Moana Womens Jogger Lounge Sweat Pants, Moana, Size: M

PMUYBHF Strapless Bras for Women Push up Women's and Comfortable Lace Front Button Gathered Large Size Strapless Back Bra Bras Women Push Up

Craft Sportswear

Just Tell Me Gold Sequin Joggers – Pink Lily