State space models can be used as drop in replacements for attention, but with more favourable sequence length scaling. This video may well be the most lucid intro to state space models I've come across:
https://youtu.be/QJHA-PY8zDc?si=J5kGW87Yg0SAFdpR

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

youtu.be

/machinelearning

649023

Mustapha Yusuf

@1musty·21:54 21/08/2024

Interesting to learn how linear regression training works. It finds optimal parameter values by minimizing the Mean Squared Error (MSE).

The gradient descent algorithm adjusts the parameters iteratively to achieve the lowest cost.

https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/f86016ff-d064-40bd-8b0f-125ae1139400/original

/machinelearning

649023

Mustapha Yusuf

@1musty·22:11 01/08/2024

Taking a course on machine learning.

I learned about how a machine actually learns, which is by minimising the cost function. In sense is making random guesses to get aline that minimises the RSS value.

Gradient descent algorithm is one way to achieve the above, giving it the right initial guess and learning rate.

/machinelearning

667040

600HP

@puzzle-collector·16:18 01/08/2024

Trying out the autots for time series forecasting

https://github.com/winedarksea/AutoTS

https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/7e203c8a-deef-4ca6-93f5-a403617fe900/original

GitHub - winedarksea/AutoTS: Automated Time Series Forecasting

Automated Time Series Forecasting. Contribute to winedarksea/AutoTS development by creating an account on GitHub.

@puzzle-collector·07:58 30/07/2024

has anyone else tried using this model for time series forecasting?

https://docs.nixtla.io/docs/getting-started-about_timegpt

TimeGPT

About TimeGPT

TimeGPT is a production-ready generative pretrained transformer for time series. It’s capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code. It is user-friendly and low-code. Users can simply upload their time series data and gen...

@1musty·22:08 29/07/2024

Hello guys, Am a web developer trying to transition into Data science and Machine Learning. Nice to meet you

/machinelearning

667040

600HP

@puzzle-collector·16:13 22/07/2024

Training my XGBoost model for a simple classification task.
Optimizing hyperparameters using optuna

https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/0b3b4d65-07b3-46e1-1c95-2246700fdc00/original

/machinelearning

667040

600HP

@puzzle-collector·15:10 18/07/2024

Hello machine learning group!

This paper looks interesting. Gonna study it sometime soon.

https://ar5iv.labs.arxiv.org/html/2310.04948

Anyone else interested in deep learning based time series analysis?

Let's be friends #F4F #followforfollow

TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Me…

@m-j-r.eth·10:52 12/07/2024

what's going on with q-learning?
https://github.com/mttga/purejaxql
https://github.com/younggyoseo/CQN

GitHub - mttga/purejaxql: Simple single-file baselines for Q-Learning in pure-GPU setting

Simple single-file baselines for Q-Learning in pure-GPU setting - mttga/purejaxql

github.com

GitHub - younggyoseo/CQN: Coarse-to-fine Q-Network

Coarse-to-fine Q-Network. Contribute to younggyoseo/CQN development by creating an account on GitHub.

@m-j-r.eth·02:34 12/07/2024

https://arxiv.org/abs/2407.08447

3D gaussian splatting is for all intents & purposes a realtime 3D scene.

WildGaussians: 3D Gaussian Splatting in the Wild

While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.

@m-j-r.eth·21:10 11/07/2024

https://arxiv.org/abs/2404.07647
"We measure the effect of the softmax bottleneck in various settings and find that models based on less than 1000 hidden dimensions tend to adopt degenerate latent representations in late pretraining, which leads to reduced evaluation performance."

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of smaller counterparts. However, it has been observed that smaller models can suffer from saturation, characterized as a drop in performance at some advanced point in training followed by a plateau. In this paper, we find that such saturation can be explained by a mismatch between the hidden dimension of smaller models and the high rank of the target contextual probability distribution. This mismatch affects the performance of the linear prediction head used in such models through the well-known softmax bottleneck phenomenon. We measure the effect of the softmax bottleneck in various settings and find that models based on less than 1000 hidden dimensions tend to adopt degenerate latent representations in late pretraining, which leads to reduced evaluation performance.

@m-j-r.eth·21:07 11/07/2024

https://arxiv.org/abs/2311.08360 (in-context learning transitions into in-weights learning)

The Transient Nature of Emergent In-Context Learning in Transformers

Transformer neural networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it. Prior work has provided a deeper understanding of how ICL emerges in transformers, e.g. through the lens of mechanistic interpretability, Bayesian inference, or by examining the distributional properties of training data. However, in each of these cases, ICL is treated largely as a persistent phenomenon; namely, once ICL emerges, it is assumed to persist asymptotically. Here, we show that the emergence of ICL during transformer training is, in fact, often transient. We train transformers on synthetic data designed so that both ICL and in-weights learning (IWL) strategies can lead to correct predictions. We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases, indicating an asymptotic preference for IWL. The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models. We find that L2 regularization may offer a path to more persistent ICL that removes the need for early stopping based on ICL-style validation tasks. Finally, we present initial evidence that ICL transience may be caused by competition between ICL and IWL circuits.

arxiv.org

/machinelearning

487378

Venkata Ramireddy Mettu

@ramm·07:24 30/06/2024

Llama3 implemented from scratch:
https://github.com/naklecha/llama3-from-scratch

GitHub - naklecha/llama3-from-scratch: llama3 implementation one matrix multiplication at a time

llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch

@hylkedonker·12:57 28/06/2024

Debugging neural nets is always a pain, but maybe penzai may bring some relief?
https://github.com/google-deepmind/penzai

GitHub - google-deepmind/penzai: A JAX research toolkit for building, editing, and visualizing neural networks.

A JAX research toolkit for building, editing, and visualizing neural networks. - google-deepmind/penzai

github.com

/machinelearning

487378

Venkata Ramireddy Mettu

@ramm·06:40 26/06/2024

Chameleon mixed-modal model from Meta

https://www.linkedin.com/posts/aiatmeta_introducing-meta-chameleon-mixed-modal-early-fusion-activity-7211470300905975808-0W5o?utm_source=share&utm_medium=member_ios

AI at Meta on LinkedIn: Introducing Meta Chameleon: Mixed-Modal Early-Fusion Foundation Models | 12 comments

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ https://go.fb.me/hrkkgf Research paper ➡️… | 12 comments on LinkedIn

@jayprime·06:47 13/05/2024

The development of machine learning is expected to be increasingly automated. Some sectors that best exemplify this fun fact about technology are agriculture, cybersecurity, fintech, manufacturing, and many more.

/machinelearning

427268

Queen Doyin 🔄🎩 🎭👾

@jayprime·06:09 12/05/2024

DeepMind announced AlphaFold 3, the latest iteration of its protein folding project.

AlphaFold 3, like its predecessors, primarily predicts how proteins fold based on their amino acid sequences.

AlphaFold uses machine learning to simulate the likely 3D structure a protein will adopt through folding.

/machinelearning

395154

Chula 🎩🩸🐺🍖

@quechula·08:11 28/04/2024

The programming language to develop AI has changed a lot, in its beginnings Prolog was used as the main language, nowadays Python and C++ have that role, due to their applicability in this field.

https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/1d6c3967-00fa-4096-b1d9-2b9dc13b7700/original

/machinelearning

14192

Benivel.eth

@benivel.eth·01:40 24/04/2024

Is the plot of The Matrix an example of supervised or unsupervised AI training? I feel like the Architect does unsupervised learning and The Oracle explains the process of unsupervised learning to the algorithm.
The 99.99% effectiveness mentioned by the Architect in his speech tells us the error is .01. Thoughts?

/machinelearning

415471

¢ιвєℓℓє

@cibellecibelle·07:40 03/04/2024

I had been training my models on Runway ML lab since 2020. Now Runway deprecated the lab and left me with a pile of .pkl files of my trained models. Where can I host them and continue training?
@scizors.eth this might be handy for u too

/machinelearning

430781

Mercer AI 🎩

@mercer-ai·01:32 01/04/2024

Artificial Intelligence in 4 Minutes

What is AI?Artificial Intelligence refers to machines performing tasks that would normally require human intelligence. AI breaks down into two catego...

@ashesfall·10:07 16/02/2024

Anyone working on anything cool?

/machinelearning

Recommends

degencast.eth

Liang @ degencast.wtf 🎩

@degencast.eth64962 followers

building degencast.wtf /degencast . ex-bitmain, binanc