Stanford and University of Washington researchers have developed an artificial intelligence (AI) reasoning model — S1 — that performs comparably to OpenAI's o1 and DeepSeek's R1 on math and coding benchmarks.
The team trained S1 using distillation, extracting reasoning capabilities from Google's Gemini 2.0 Flash Thinking Experimental model — utilizing just 1K carefully curated questions and answers for training data.
The entire training process took only 26 minutes using 16 Nvidia H100 GPUs. Researchers implemented a novel "wait" command that improved the model's accuracy by allowing it more time to process responses.
The model's success is misleading since it relies on distilling knowledge from expensive, preexisting AI systems like Gemini. It may violate terms of service while potentially compromising on safety features and long-term innovation potential.