Bimoo

737615

Hylke

@hylkedonker #737615

Dutch machine learning enthousiast 🤖 with a love for programming.

10 Follower 63 Following

@hylkedonker·13:19 05/04/2025

Looks like coding agents are already doing a pretty good job implementing scientific papers:
https://arxiv.org/abs/2504.01848

PaperBench: Evaluating AI's Ability to Replicate AI Research

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including understanding paper contributions, developing a codebase, and successfully executing experiments. For objective evaluation, we develop rubrics that hierarchically decompose each replication task into smaller sub-tasks with clear grading criteria. In total, PaperBench contains 8,316 individually gradable tasks. Rubrics are co-developed with the author(s) of each ICML paper for accuracy and realism. To enable scalable evaluation, we also develop an LLM-based judge to automatically grade replication attempts against rubrics, and assess our judge's performance by creating a separate benchmark for judges. We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0%. Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline. We open-source our code (https://github.com/openai/preparedness) to facilitate future research in understanding the AI engineering capabilities of AI agents.

arxiv.org

737615

Hylke

@hylkedonker·09:56 13/02/2025

For those interested in the intersection of AI and statistics, I have written a blog post how to build bayesian attention:
https://medium.com/data-science-collective/exploiting-the-structured-state-space-duality-to-build-bayesian-attention-3883ab8bacd4

Exploiting the Structured State-Space Duality To Build Bayesian Attention

Building Bayesian Attention From Scratch

@hylkedonker·11:04 18/12/2024

Ah, finally, Gpu programming with mojo:
https://www.modular.com/blog/introducing-max-24-6-a-gpu-native-generative-ai-platform?utm_campaign=24_6&utm_source=discord)**

Modular: Introducing MAX 24.6: A GPU Native Generative AI Platform

MAX 24.6 release bog featuring MAX GPU

www.modular.com

737615

Hylke

@hylkedonker·19:58 14/12/2024

Looking at the nightly changelogs, release of mojo 24.6, which is supposed to ship with gpu support, is coming any day now.

737615

Hylke

@hylkedonker·09:43 06/10/2024

Recurrent neural networks, are transformers, are state space models, are convolutions?
Looks like we went full circle, back to 2012, when deep learning made it's first splash.

https://arxiv.org/abs/2405.21060

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

arxiv.org

737615

Hylke

@hylkedonker·15:10 28/09/2024

State space models can be used as drop in replacements for attention, but with more favourable sequence length scaling. This video may well be the most lucid intro to state space models I've come across:
https://youtu.be/QJHA-PY8zDc?si=J5kGW87Yg0SAFdpR

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

@hylkedonker·12:57 28/06/2024

Debugging neural nets is always a pain, but maybe penzai may bring some relief?
https://github.com/google-deepmind/penzai

GitHub - google-deepmind/penzai: A JAX research toolkit for building, editing, and visualizing neural networks.

A JAX research toolkit for building, editing, and visualizing neural networks. - google-deepmind/penzai

github.com

/machinelearning