𝚐𝔪𝟾𝚡𝚡𝟾
/gm8xx8163
☺︎
Seems like we’re all channeling our inner Karpathy—also DeepSeek stan’s standup!
Damn right I made the nice list.
DeepSeek-V3
Reasoning and Coding benchmarks:
BigCodeBench-Hard:
- 🥇 1st Overall
- Complete: 40.5%
- Instruct: 28.4%
- Average: 34.5%
> Comparisons: Gemini-Exp-1206 (34.1%), o1-2024-12-17 (32.8%)
Aider Polyglot Leaderboard:
-🥈 2nd Place
- o1: 62%
- DeepSeek-V3 Preview: 48%
- Sonnet: 45%, Gemini-Exp-1206: 38%, o1-Mini: 33%
SWE-Bench Lite:
- Performance: 23.00%
LiveBench Results:
- All Groups Average: 60.4
- Reasoning: 50.0
- Coding: 63.4
- Mathematics: 60.0
- Data Analysis: 57.7
- Language: 50.2
- Instruction Following: 80.9
Key Architecture Upgrades (V3 vs. V2):
60 tokens/second (3x faster than V2!)
> Vocab Size: 129,280 (↑ from 102,400)
> Hidden Size: 7,168 (↑ from 4,096)
> Layers: 61 (↑ from 30)
> Attention Heads: 128 (↑ from 32)
> Max Position Embeddings: 4,096 (↑ from 2,048)
…
https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b
Reasoning and Coding benchmarks:
BigCodeBench-Hard:
- 🥇 1st Overall
- Complete: 40.5%
- Instruct: 28.4%
- Average: 34.5%
> Comparisons: Gemini-Exp-1206 (34.1%), o1-2024-12-17 (32.8%)
Aider Polyglot Leaderboard:
-🥈 2nd Place
- o1: 62%
- DeepSeek-V3 Preview: 48%
- Sonnet: 45%, Gemini-Exp-1206: 38%, o1-Mini: 33%
SWE-Bench Lite:
- Performance: 23.00%
LiveBench Results:
- All Groups Average: 60.4
- Reasoning: 50.0
- Coding: 63.4
- Mathematics: 60.0
- Data Analysis: 57.7
- Language: 50.2
- Instruction Following: 80.9
Key Architecture Upgrades (V3 vs. V2):
60 tokens/second (3x faster than V2!)
> Vocab Size: 129,280 (↑ from 102,400)
> Hidden Size: 7,168 (↑ from 4,096)
> Layers: 61 (↑ from 30)
> Attention Heads: 128 (↑ from 32)
> Max Position Embeddings: 4,096 (↑ from 2,048)
…
https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b
NAH, I'D WIN.
it’s all coming together.
For those who aren’t familiar, Alec Radford is the first author of key papers on GPT, GPT-2, CLIP, and Whisper, all of which have played a pivotal role in advancing modern AI.
New simulation engine
- Cross-platform: natively supports Nvidia/AMD/Apple/Intel GPU/CPU on Windows, MacOS, and Linux.
- Open-source: built entirely in Python (ish).
- Speed: outperforms GPU-accelerated platforms like Isaac Gym and MJX by 10-80x, reaching ~430,000x real-time simulation speed.
- Efficient: trains robotic locomotion policies for real-world application in 26 seconds on a single RTX 4090, processing 380k steps per second, including policy updates.
🔗: https://genesis-world.readthedocs.io/en/latest/
- Cross-platform: natively supports Nvidia/AMD/Apple/Intel GPU/CPU on Windows, MacOS, and Linux.
- Open-source: built entirely in Python (ish).
- Speed: outperforms GPU-accelerated platforms like Isaac Gym and MJX by 10-80x, reaching ~430,000x real-time simulation speed.
- Efficient: trains robotic locomotion policies for real-world application in 26 seconds on a single RTX 4090, processing 380k steps per second, including policy updates.
🔗: https://genesis-world.readthedocs.io/en/latest/
I’ll say it again:
Ultimately, every path converges on robotics.
Scalable. Adaptable. Collaborative. This is the future.
Ultimately, every path converges on robotics.
Scalable. Adaptable. Collaborative. This is the future.
The stack could be improved, but you get the gist.
busy week 🛬
Sora is here (System Card): https://openai.com/index/sora-is-here/
Random Amazon Nova Pro sighting, courtesy of Meta.
domain specific o1.
“today we are announcing reinforcement finetuning, which makes it really easy to create expert models in specific domains with very little training data.”- Sam Altman
I told you, mixture of domain experts were the way forward. Were you paying attention, Anon?
(sips ☕️)
“today we are announcing reinforcement finetuning, which makes it really easy to create expert models in specific domains with very little training data.”- Sam Altman
I told you, mixture of domain experts were the way forward. Were you paying attention, Anon?
(sips ☕️)
Definitely some real ones 😏
This should be the standard for open-source language model releases.
Few things are more frustrating than announcements where weights or essential components are labeled as “coming soon.”
Few things are more frustrating than announcements where weights or essential components are labeled as “coming soon.”
I might drop some alpha on this 🔜