Caio Temer Episode 13 part 1

1/17/2024

The Paleo Cyborg Podcast

0:00

1:21:30

Caio Temer explains LLMs and machine learning. found that generalized learning happens after 10k epochs with a considerable amount of data per epoch, are there studies in this regard? ChatGPT Yes, there are studies related to generalized learning after many times, with a large amount of data per season. A notable example is the "Deep Bootstrap Framework", presented at ICLR 2021. This framework makes a connection between generalization and online optimization. He compares a "real world", where a model is trained on a finite number of samples by multiple times, with an "ideal world", where each training step uses fresh samples from a virtually infinite data set. The research found that models that train quickly in infinite data are the same as those that generalize well in finite data. This study offers a new perspective on design choices in practice and provides a theoretical path to understand generalization. Get started with ChatGPT Caio Temer show notes: Part One summary interesting topics for trading- -how I created the first bots -how models combine language models with autonomous robots -why conventional methods don't work about language models: -language models are simpler to obtain with other structures -problem in the current structure and how I solved it - development of intuitive neural networks using hash abstraction Caio developed a way for neural networks, through cryptography techniques that use sum, to learn to abstract and understand deeply quickly created a lib of a tiny language model that learns to predict all tokens at an absurd speed by using distance vector embeds instead of tokens Transformers: [](https://arxiv.org/abs/2205.13504) for example, if the network is exploding, there's something wrong with a component, or the update rate is wrong If the rate is right, the linearities are wrong, just exchange some linear or more linear connection like relu for tanh or add a tanh. softmax in deep layers as a projector reduces network convergence time by forcing layer alignment training the transformer one layer at a time, adding a copy of the first layer and freezing the first accelerates training and improves retention In sequence for sequence, a permuted linear dense solves 99% of problems in less time and better than the transformer. Giant embeds work like entire networks, only needing one activation at the end after reshaping “1 tanh layer of 1 million units solves everything better than any other network with the same number of parameters, at least in the tasks I use, I always start with it, then see how to reduce” Collabs: making a language model using Facebook's FastText https://colab.research.google.com/drive/1wVQrpzyY-SkCZTRZCcP6xJGRdg1ik0jR#scrollTo=pIW6-VwMuRlz https://colab.research.google.com/drive/1oABIZr1xiIu7DKc7AbbZfeBJFLkZW6Ep#scrollTo=XfR-3PpPYoFU https://github.com/rtkclouds/fast-js-language-model https://x.com/canalCCore2/status/1735044356535472278?s=20 GitHub gists: [](https://gist.github.com/rtkclouds/50b81d10736793f07cdca354516e8757) [](https://gist.github.com/rtkclouds/a6ee9afd96461ca94b3e9c22f78bda3a)

More episodes from "The Paleo Cyborg Podcast"

More Episodes

Get the whole world of podcasts with the free GetPodcast app.

Subscribe to your favorite podcasts, listen to episodes offline and get thrilling recommendations.

Caio Temer Episode 13 part 1

The Paleo Cyborg Podcast

More episodes from "The Paleo Cyborg Podcast"

Part IV of Craig Weinberg’s epic podcast

Episode 16 part 3 with Craig Weinberg

Free Will explained in five minutes

Map vs Territory Dawn of Justice

Episode 16 part 2 with Craig Weinberg 🧙‍♂️

Craig Weinberg Returns episode 16 part 1

Caio Temer Episode 13 part 2

Caio Temer Episode 13 part 1

Forrest & Ai part 2: Wittgenstein Re-rises

Forrest & Ai episode 12 part 1