Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

sub.thursdai.news

More episodes from "ThursdAI - The top AI news from the past week"

📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon?
há 7 horas
1:38:28
This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsWoohoo, we're almost done with July (my favorite month) and the Open Source AI decided to go out with some fireworks 🎉Hey everyone, Alex here, writing this without my own personal superintelligence (more: later) and this week has been VERY BUSY with many new open source releases.Just 1 hour before the show we already had 4 breaking news releases, a tiny Qwen3-coder, Cohere and StepFun both dropped multimodal SOTAs and our friends from Krea dropped a combined model with BFL called Flux[Krea] 👏 This is on top of a very very busy week, with Runway adding conversation to their video model Alpha, Zucks' superintelligence vision and a new SOTA open video model Wan 2.2. So let's dive straight into this (as always, all show notes and links are in the end) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open Source LLMs & VLMs Tons of new stuff here, I'll try to be brief but each one of these releases deserves a deeper dive for sure. Alibaba is on 🔥 with 3 new Qwen models this weekYes, this is very similar to last week, where they have also dropped 3 new SOTA models in a week, but, these are additional ones. It seems that someone in Alibaba figured out that after splitting away from the hybrid models, they can now release each model separately and get a lot of attention per model! Here's the timeline: * Friday (just after our show): Qwen3-235B-Thinking-2507 drops (235B total, 22B active, HF) * Tuesday: Qwen3-30B-Thinking-2507 (30B total, 3B active, HF)* Today: Qwen3-Coder-Flash-2507 lands (30B total, 3B active for coding, HF)Lets start with the SOTA reasoner, the 235B(A22B)-2507 is absolutely the best reasoner among the open source models.We've put the model on our inference service (at crazy prices $.10/$.10) and it's performing absolutely incredible on reasoning tasks. It also jumped to the top OSS model on Artificial Analysis scores, EQBench, Long Context and more evals. It a really really good reasoning model! Smaller Qwens for local useJust a week ago, we've asked Junyang on our show, about smaller models that folks can run on their devices, and he avoided by saying "we're focusing on the larger models" and this week, they delivered not 1 but 2 smaller versions of the bigger models (perfect for Speculative Decoding if you can host the larger ones that is) The most interesting one is the Qwen3-Coder-flash, which came out today, with very very impressive stats - and the ability to run locally with almost 80 tok/s on a macbook! So for the last two weeks, we now have 3 Qwens (Instruct, Thinking, Coder) and 2 sizes for each (all three have a 30B/A3B version now for local use) 👏Z.ai GLM and StepFun Step3 As we've said previously, Chinese companies completely dominate the open source AI field right now, and this week as saw yet another crazy testament to how stark the difference is! We've seen a rebranded Zhipu (Z.ai previously THUDM) release their new GLM 4.5 - which gives Qwen3-thinking a run for it's money. Not quite at that level, but definitely very close. I personally didn't love the release esthetics, showing a blended eval score, which nobody can replicate feels a bit off. We also talked about how StepFun has stepped in (sorry for the pun) with a new SOTA in multimodality, called Step3. It's a 321B MoE (with a huge 38B active param count) that achieves very significant multi modal scores (The benchmarks look incredible: 74% on MMMU, 64% on MathVision) Big Companies APIs & LLMsWell, we were definitely thinking we'll get GPT-5 or the Open Source AI model from OpenAI this week, but alas, the tea leaves readers were misled (or were being misleading). We 100% know that gpt-5 is coming as multiple screenshots were blurred and then deleted showing companies already testing it. But it looks like August is going to be even hotter than July, with multiple sightings of anonymous testing models on Web Dev arena, like Zenith, Summit, Lobster and a new mystery model on OpenRouter called Zenith - that some claim are the different thinking modes of GPT-5 and the open source model? Zuck shares vision for personalized superintelligence (Meta)In a very "Nat Fridman" like post, Mark Zuckerberg finally shared the vision behind his latest push to assemble the most cracked AI engineers.In his vision, Meta is the right place to provide each one with personalized superintelligence, enhancing individual abilities with user agency according to their own values. (as opposed to a centralized model, which feels like his shot across the bow for the other frontier labs) A few highlights: Zuck leans heavily into the rise of personal devices on top of which humans will interact with this superintelligence, including AR glasses and a departure from a complete "let's open source everything" dogman of the past, now there will be a more deliberate considerations of what to open source. This Week's Buzz: Putting Open Source to Work with W&BWith all these incredible new models, the biggest question is: how can you actually use them? I'm incredibly proud to say that the team at Weights & Biases had all three of the big new Qwen models—Thinking, Instruct, and Coder—live on W&B Inference on day one (link)And our pricing is just unbeatable. Wolfram did a benchmark run that would have cost him $150 using Claude Opus. On W&B Inference with the Qwen3-Thinking model, it cost him 22 cents. That's not a typo. It's a game-changer for developers and researchers.To make it even easier, a listener of the show, Olaf Geibig, posted a fantastic tutorial on how you can use our free credits and W&B Inference to power tools like Claude Code and VS Code using LiteLLM. It takes less than five minutes to set up and gives you access to state-of-the-art models for pennies. All you need to do is add this config to vllm and run claude (or vscode) through it! Give our inference service a try here and follow our main account @weights_biases a follow as we often drop ways to get additional free credits when new models releaseVision & Video modelsWan2.2: Open-Source MoE Video Generation Model Launches (X, HF)This is likely the best open source video model, but definitely the first MoE video model! It came out with text2video, image2video and a combined version. With 5 second 720p videos, that can even be generator at home on a single 4090, this is definitely a step up in the quality of video models that are fully open source. Runway changes the game again - Gen-3 Aleph model for AI video editing / transformation (X, X)Look, there's simply no denying this, AI video has had an incredible year, from open source like Wan, to proprietary models with sounds like VEO3. And it's not surprising that we're seeing this trend, but it's definitely very exciting when we see an approach like Runway has, to editing. This adds a chat to the model, and your ability to edit.. anything in the scene. Remove / Add people and environmental effects, see the same scene from a different angle and a lot more! Expect personalized entertainment very soon! AI Art & Diffusion & 3DFLUX.1 Krea [dev] launches as a state-of-the-art open-weights text-to-image model (X, HuggingFace)Black Forest Labs teamed with Krea AI for Flux.1 Krea [dev], an open-weights text-to-image model ditching the "AI gloss" for natural, distinctive vibes—think DALL-E 2's quirky grain without the saturation. It outperforms open peers and rivals pros in prefs, fully Flux-compatible for LoRAs/tools. Yam and I geeked over the aesthetics frontier; it's a flexible base for fine-tunes, available on Hugging Face with commercial options via FAL/Replicate. If you're tired of cookie-cutter outputs, this breathes fresh life into generations.Ideogram Character launches: one-shot character consistency for everyone (X)Ideogram's Characters feature lets you upload one pic for instant, consistent variants—free for all, with inpainting to swap into memes/art. My tests nailed expressions/scenes (me in cyberpunk? Spot-on), though not always photoreal. Wolfram praised the accuracy; it's a meme-maker's dream! and they give like 10 free ones so give it a goTencent Hunyuan3D World Model 1.0 launches as the first open-source, explorable 3D world generator (X, HF)Tencent's Hunyuan3D World Model 1.0 is the first open-source generator of explorable 3D worlds from text/image—360° immersive, exportable meshes for games/modeling. ~33GB VRAM on complex scenes, but Wolfram called it a metaverse step; I wandered a demo scene, loving the potential despite edges. Integrate into CG pipelines? Game-changer for VR/creators.Voice & Audio Look I wasn't even mentioning this on the show, but it came across my feed just as I was about to wrap up ThursdAI, and it's really something. Riffusion joined forces producer and using FUZZ-2 they now have a fully Chatable studio producer, you can ask for.. anything you would ask in a studio! Here's my first reaction, and it's really fun, I think they still are open with the invite code 'STUDIO'... I'm not afiliated with them at all! Tools Ok I promised some folks we'll add this in, Nisten went super viral last week with him using a new open source tool called Crush from CharmBracelet, which is an open version of VSCode and it looks awesome! He gave a demo live on the show, including how to set it up to work, with subagents etc. If you're into vibe coding, and using the open source models, def. give Crush a try it's really flying and looks cool! Phew, ok, we somehow were able to cover ALLL these releases this week, and we didn’t even have an interview! Here’s the TL;DR and links to the folks who subscribed (I’m trying a new thing to promote subs on this newsletter) and see you in two weeks (next week is Wolframs turn again as I’m somewhere in Europe!) ThursdAI - July 31st, 2025 - TL;DR* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldj* Open Source LLMs* Zhipu drops GLM-4.5 355B (A32B) AI model (X, HF)* ARCEE AFM‑4.5B and AFM‑4.5B‑Base weights released (X, HF)* Qwen is on 🔥 - 3 new models:
📆 ThursdAI - July 24, 2025 - Qwen-mas in July, The White House's AI Action Plan & Math Olympiad Gold for AIs + coding a 3d tetris on stream
7/24/2025
1:43:23
What a WEEK! Qwen-mass in July. Folks, AI doesn't seem to be wanting to slow down, especially Open Source! This week we see yet another jump on SWE-bench verified (3rd week in a row?) this time from our friends at Alibaba Qwen. Was a pleasure of mine to host Junyang Lin from the team at Alibaba to come and chat with us about their incredible release with, with not 1 but three new models! Then, we had a great chat with Joseph Nelson from Roboflow, who not only dropped additional SOTA models, but was also in Washington at the annocement of the new AI Action plan from the WhiteHouse. Great conversations this week, as always, TL;DR in the end, tune in! Open Source AI - QwenMass in JulyThis week, the open-source world belonged to our friends at Alibaba Qwen. They didn't just release one model; they went on an absolute tear, dropping bomb after bomb on the community and resetting the state-of-the-art multiple times.A "Small" Update with Massive Impact: Qwen3-235B-A22B-Instruct-2507Alibaba called this a minor refresh of their 235B parameter mixture-of-experts.Sure—if you consider +13 points on GPQA, 256K context window minor. The 2507 drops hybrid thinking. Instead, Qwen now ships separate instruct and chain-of-thought models, avoiding token bloat when you just want a quick answer. Benchmarks? 81 % MMLU-Redux, 70 % LiveCodeBench, new SOTA on BFCL function-calling. All with 22 B active params.Our friend of the pod, and head of development at Alibaba Qwen, Junyang Lin, join the pod, and talked to us about their decision to uncouple this model from the hybrid reasoner Qwen3."After talking with the community and thinking it through," he said, "we decided to stop using hybrid thinking mode. Instead, we'll train instruct and thinking models separately so we can get the best quality possible."The community felt the hybrid model sometimes had conflicts and didn't always perform at its best. So, Qwen delivered a pure non-reasoning instruct model, and the results are staggering. Even without explicit reasoning, it's crushing benchmarks. Wolfram tested it on his MMLU-Pro benchmark and it got the top score of all open-weights models he's ever tested. Nisten saw the same thing on medical benchmarks, where it scored the highest on MedMCQA. This thing is a beast, getting a massive 77.5 on GPQA (up from 62.9) and 51.8 on LiveCodeBench (up from 32). This is a huge leap forward, and it proves that a powerful, well-trained instruct model can still push the boundaries of reasoning. The New (open) King of Code: Qwen3-Coder-480B (X, Try It, HF)Just as we were catching our breath, they dropped the main event: Qwen3-Coder. This is a 480-billion-parameter coding-specific behemoth (35B active) trained on a staggering 7.5 trillion tokens, with a 70% code ratio, that gets a new SOTA on SWE-bench verified with 69.6% (just a week after Kimi got SOTA with 65% and 2 weeks after Devstral's SOTA of 53% 😮) To get this model to SOTA, Junyang explained they used reinforcement learning with over 20,000 parallel sandbox environments. This allows the model to interact with the environment, write code, see the output, get the reward, and learn from it in a continuous loop. The results speak for themselves.With long context abilities 256K with up to 1M extended with YaRN, this coding beast tops the charts, and is achieving Sonnet level performance for significantly less cost! Both models supported day-1 on W&B Inference (X, Get Started)I'm very very proud to announce that both these incredible models get Day-1 support on our W&B inference (and that yours truly is now part of the decision of which models we host!) With unbeatable prices ($0.10/$0.10 input/output 1M for A22B, $1/$1.5 for Qwen3 Coder) and speed, we are hosting these models at full precision to give you the maximum possible intelligence and the best bang for your buck! Nisten has setup our (OpenAI compatible) endpoint with his Cline coding assistant and has built a 3D Tetris game live on the show, and it absolutely went flying. This demo perfectly captures the convergence of everything we're excited about: a state-of-the-art open-source model, running on a blazing-fast inference service, integrated into a powerful open-source tool, creating something complex and interactive in seconds.If you want to try this yourself, we're giving away credits for W&B Inference. Just find our announcement tweet for the Qwen models on the @weights_biases X account and reply with "coding capybara" (a nod to Qwen's old mascot!). Add "ThursdAI" and I'll personally make sure you get bumped up the list!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big Companies & APIsAmerica’s AI Action Plan: A New Space Race for AI Dominance (ai.gov)Switching gears to policy, I’m was excited to cover the White House’s newly unveiled “America’s AI Action Plan.” This 25-page strategy, dropped this week, frames AI as a national priority on par with the space race or Cold War, aiming to secure U.S. dominance with 90 policy proposals. I was thrilled to have Joseph Nelson from RoboFlow join us fresh from the announcement event in Washington, sharing the room’s energy and insights. The plan pushes for deregulation, massive data center buildouts, workforce training, and—most exciting for us—explicit support for open-source and open-weight models. It’s a bold move to counter global competition, especially from China, while fast-tracking infrastructure like chip fabrication and energy grids.Joseph broke down the vibe at the event, including a surreal moment where the President riffed on Nvidia’s market dominance right in front of Jensen Huang. But beyond the anecdotes, what strikes me is the plan’s call for startups and innovation—think grants and investments via the Department of Defense and Small Business Administration. It’s like a request for new AI companies to step up. As someone who’s railed against past moratorium fears on this show, seeing this pro-innovation stance is a huge relief.🔊 Voice & Audio – Higgs Audio v2 Levels Up (X)Boson AI fused a 3B-param Llama 3.2 with a 2.2B audio Dual-FFN and trained on ten million hours of speech + music. Result: Higgs Audio v2 beats GPT-4o-mini and ElevenLabs v2 on prosody, does zero-shot multi-speaker dialog, and even hums melodies. The demo runs on a single A100 and sounds pretty-good. The first demo I played was not super impressive, but the laugh track made up for it! 🤖 A Week with ChatGPT AgentLast week, OpenAI dropped the ChatGPT Agent on us during our stream, and now we've had a full week to play with it. It's a combination of their browser-operating agent and their deeper research agent, and the experience is pretty wild.Yam had it watching YouTube videos and scouring Reddit comments to create a comparison of different CLI tools. He was blown away, seeing the cursor move around and navigate complex sites right on his phone.I put it through its paces as well. I tried to get it to order flowers for my girlfriend (it got all the way to checkout!), and it successfully found and filled out the forms for a travel insurance policy I needed. My ultimate test (live stream here), however, was asking it to prepare the show notes for ThursdAI, a complex task involving summarizing dozens of my X bookmarks. It did a decent job (a solid C/B), but still needed my intervention. It's not quite a "fire-and-forget" tool for complex, multi-step tasks yet, but it's a huge leap forward. As Yam put it, "This is the worst that agents are going to be." And that's an exciting thought.What a week. From open-source models that rival the best closed-source giants to governments getting serious about AI innovation, the pace is just relentless. It's moments like Nisten's live demo that remind me why we do this show—to witness and share these incredible leaps forward as they happen. We're living in an amazing time.Thank you for being a ThursdAI subscriber. As always, here's the TL;DR and show notes for everything that happened in AI this week.Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it.TL;DR and Show Notes* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed* Junyang Lin - Qwen Team, Alibaba (@JustinLin610)* Joseph Nelson - Co-founder & CEO, Roboflow (@josephnelson)* Open Source LLMs* Sapient Intelligence releases Hierarchical Reasoning Model (HRM), a tiny 27M param model with impressive reasoning on specific tasks (X, arXiv).* Qwen drops a "little" update: Qwen3-235B-A22B-Instruct-2507, a powerful non-reasoning model (X, HF Model).* Qwen releases the new SOTA coding agent model: Qwen3-Coder-480B-A35B-Instruct (X, HF Model).* Hermes-Reasoning Tool-Use dataset with 51k tool-calling examples is released (X, HF Dataset).* NVIDIA releases updates to their Nemotron reasoning models.* Big CO LLMs + APIs* The White House unveils "America’s AI Action Plan" to "win the AI race" (X, White House PDF).* Both OpenAI (X) and Google DeepMind win Gold at the International Math Olympiad (IMO), with ByteDance's Seed-Prover taking Silver (GitHub).* The AI math breakthrough has a "gut punch" effect on the math community (Dave White on X).* Google now processes over 980 trillion tokens per month across its services.* A week with ChatGPT Agent: testing its capabilities on real-world tasks.* This Week's Buzz* Day 0 support for both new Qwen models on W&B Inference (Try it, Colab). Reply to our tweet with "coding capybara ThursdAI" for credits!* Live on-stream demo of Qwen3-Coder building a 3D Tetris game using kline.* Interesting Research* Researchers discover subliminal learning in LLMs, where traits are passed through seemingly innocuous data (X, arXiv).* Apple proposes multi-token prediction, speeding up LLMs by up to 5x without quality loss (X, arXiv).* Voice & Audio* Boson AI open-sources Higgs Audio v2, a unified TTS model that beats GPT-4o-mini and ElevenLabs (X, HF Model).* AI Art & Diffusion & 3D* Decart AI Releases MirageLSD, a real-time live-stream diffusion model for instant video transformation (X Post).* Tools* Qwen releases qwen-code, a CLI tool and agent for their new coder models. (Github)* GitHub Spark, a new AI-powered feature from GitHub (Simon Willison on X). This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Don't miss an episode of “ThursdAI - The top AI news from the past week” and subscribe to it in the GetPodcast app.
📆 ThursdAI - July 17th - Kimi K2 👑, OpenAI Agents, Grok Waifus, Amazon Kiro, W&B Inference & more AI news!
7/17/2025
1:45:29
Hey everyone, Alex here 👋 and WHAT a week to turn a year older! Not only did I get to celebrate my birthday with 30,000+ of you live during the OpenAI stream, but we also witnessed what might be the biggest open-source AI release since DeepSeek dropped. Buckle up, because we're diving into a trillion-parameter behemoth, agentic capabilities that'll make your head spin, and somehow Elon Musk decided Grok waifus are the solution to... something.This was one of those weeks where I kept checking if I was dreaming. Remember when DeepSeek dropped and we all lost our minds? Well, buckle up because Moonshot's Kimi K2 just made that look like a warm-up act. And that's not even the wildest part of this week! As always, all the show notes and links are at the bottom, here's our liveshow (which included the full OAI ChatGPT agents watch party) - Let's get into it! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.🚀 Open Source LLMs: The Kimi K2 RevolutionThe New Open Source King Has ArrivedFolks, I need you to understand something - just a little after we finished streaming last week celebrating Grok 4, a company called Moonshot decided to casually drop what might be the most significant open source release since... well, maybe ever?Kimi K2 is a 1 trillion parameter model. Yes, you read that right - TRILLION. Not billion. And before you ask "but can my GPU run it?" - this is an MOE (Mixture of Experts) with only 32B active parameters, which means it's actually usable while being absolutely massive.Let me give you the numbers that made my jaw drop:* 65.8% on SWE-bench Verified - This non-reasoning model beats Claude Sonnet (and almost everything else)* 384 experts in the mixture (the scale here is bonkers)* 128K context window standard, with rumors of 2M+ capability* Trained on 15.5 trillion tokens with the new Muon optimizerThe main thing about the SWE-bench score is not even just the incredible performance, it's the performance without thinking/reasoning + price! The Muon MagicHere's where it gets really interesting for the ML nerds among us. These folks didn't use AdamW - they used a new optimizer called Muon (with their own Muon Clip variant). Why does this matter? They trained to 15.5 trillion tokens with ZERO loss spikes. That beautiful loss curve had everyone in our community slack channels going absolutely wild. As Yam explained during the show, claiming you have a better optimizer than AdamW is like saying you've cured cancer - everyone says it, nobody delivers. Well, Moonshot just delivered at 1 trillion parameter scale.Why This Changes EverythingThis isn't just another model release. This is "Sonnet at home" if you have the hardware. But more importantly:* Modified MIT license (actually open!)* 5x cheaper than proprietary alternatives* Base model released (the first time we get a base model this powerful)* Already has Anthropic-compatible API (they knew what they were doing)The vibes are OFF THE CHARTS. Every high-taste model tester I know is saying this is the best open source model they've ever used. It doesn't have that "open source smell" - it feels like a frontier model because it IS a frontier model.Not only a math geniusImportantly, this model is great at multiple things, as folks called out it's personality or writing style specifically! Our Friend Sam Paech, creator of EQBench, has noted that this is maybe the first time an open source model writes this well, and is in fact SOTA on his Creative Writing benchmark and EQBench! Quick ShoutoutsBefore we dive deeper, huge props to:* Teknium for dropping the Hermes 3 dataset (nearly 1M high-quality entries!) (X)* LG (yes, the fridge company) for EXAONE 4.0 - their 32B model getting 81.8% on MMLU Pro is no joke (X)🎉 This Week's Buzz: W&B Inference Goes Live with Kimi-K2! (X)Ok, but what if you want to try Kimi-K2 but don't have the ability to run 1T models willy nilly? Well, Folks, I've been waiting TWO AND A HALF YEARS to say this: We're no longer GPU poor!Weights & Biases + CoreWeave = Your new inference playground. We launched Kimi K2 on our infrastructure within 3 days of release! Sitting behind the scenes on this launch was surreal - as I've been covering all the other inference service launches, I knew exactly what we all want, fast inference, full non-quantized weights, OpenAI API compatibility, great playground to test it out, function calling and tool use. And we've gotten almost all of these, while the super cracked CoreWeave and W&B Weave teams worked their ass off over the weekend to get this shipped in just a few days! And here’s the kicker: I’m giving away $50 in inference credits to 20 of you to try Kimi K2 on our platform. Just reply “K2-Koolaid-ThursdAI” to our X launch post here and we'll pick up to 20 winners with $50 worth of credits! 🫡It’s live now at api.inference.wandb.ai/v1 (model ID: moonshotai/Kimi-K2-Instruct), fully integrated with Weave for tracing and evaluation. We’re just getting started, and I want your feedback to make this even better. More on W&B Inference Docs - oh and everyone gets $2 free even without me, which is like 500K tokens to test it out.Big CO LLMs + APIsThe big players didn't sleep this week either—funding flew like confetti, Grok went full anime, and OpenAI dropped agents mid-stream (we reacted live!). Amazon snuck in with dev tools, and Gemini embeddings claimed the throne. Let's get through some of these openers before we get to the "main course" which of course came from OpenAIGrok Gets... Waifus?I can't believe I'm writing this in a serious AI newsletter, but here we are. XAI added animated 3D characters to Grok, including "Annie" - and let's just say she's very... interactive. XAI partnered with a company that does real time animated 3d avatars and these are powered by Grok so... they are a bit unhinged! The same Elon who's worried about birth rates just created nuclear-grade digital companions. The Grok app shot to #1 in the Japanese App Store immediately. Make of that what you will. 😅They even posted a job for "Full Stack Waifu Engineer" - we truly live in the strangest timeline.XAI also this week addressed the concerns we all had with "mechahitler" and the Grok4 issues post launch (where it used it's web search to see "what does Elon think" when it was asked about a few topics) Credit for finding the prompt change: Simon WillisonOther Quick Hits from Big Tech* Gemini Embedding Model: New SOTA on MTEB leaderboards (68.32 score) (dev blog)* Amazon S3 Vectors: Native vector storage in S3 (huge for RAG applications) (X)* Amazon Kiro: Their VS Code fork with spec-driven development (think PM-first coding) (X)🔥 OpenAI Agents: ChatGPT Levels Up to Do-It-All Sidekick We timed it perfectly—OpenAI's live stream hit mid-show, and we reacted with 30,000+ of you! And while we didn't get the rumored Open Source model from OAI, we did get... ChatGPT Agent (codename Odyssey) which merges Deep Research's fast-reading text browser with Operator's clicky visual browser and terminal access, all RL-tuned to pick tools smartly. It browses, codes, calls APIs (Google Drive, GitHub, etc., if you connect), generates images, and builds spreadsheets/slides—handling interruptions, clarifications, and takeovers for collaboration. SOTA jumps: 41.6% on Humanities Last Exam (double O3), 27.4% on FrontierMath, 45.5% on SpreadsheetBench, 68.9% on BrowseComp.These are insane jumps in capabilities folks, just... mindblowing that we can now have agents that are SO good! The team demoed wedding planning (outfits, hotels, gifts with weather/venue checks), sticker design/ordering, and an MLB itinerary spreadsheet—wild to watch it chain thoughts on recordings. Wolfram called it the official start of agent year; Yam hyped the product polish (mobile control!); Nisten noted it's packaged perfection over DIY. I refreshed ChatGPT obsessively—mind-blown at turning my phone into a task master. Available now for Pro/Plus/Team (400/40 queries/month), Enterprise soon. This is the "feel the AGI" moment Sam mentioned—game over for tedious tasks (OpenAI announcement: https://openai.com/index/introducing-chatgpt-agent/).I've yet to get access to it, but I'm very much looking forward to testing it out and letting you guys know how it works! Combining the two browser modes (visual that has my cookies and textual that can scan tons of websites super quick) + CLI + deep research abilities + RL for the right kind of tool use all sounds incredibly intriguing! Vision & VideoRunway’s Act-Two: Motion Capture Gets a Major Upgrade (X, YouTube)Runway’s latest drop, Act-Two, is a next-gen motion capture model that’s got creatives buzzing. It tracks head, face, body, and hands with insane fidelity, animating any character from a single performance video. It’s a huge leap from Act-One, already in use for film, VFX, and gaming, and available now to enterprise and creative customers with a full rollout soon. Voice & AudioMistral’s Voxtral: Open Speech Recognition Champ (X, HF)Mistral AI is killing it with Voxtral, a state-of-the-art open speech recognition model. With Voxtral Small at 24B for production and Mini at 3B for edge devices, it outperforms OpenAI’s Whisper large-v3 across English and multilingual tasks like French, Spanish, Hindi, and German. Supporting up to 32K token context (about 30-40 minutes of audio), it offers summarization and Q&A features, all under an Apache 2.0 license. At just $0.001 per minute via API, it’s a steal for real-time or batch transcription. ToolsLiquid AI’s LEAP and Apollo: On-Device AI for AllLiquid AI is bringing AI to your pocket with LEAP, a developer platform for building on-device models, and Apollo, a lightweight iOS app to run small LLMs locally. We’re talking 50-300MB models optimized for minimal battery drain and instant inference, no cloud needed. It’s privacy-focused and plug-and-play, perfect for offline workflows on Android and iOS. Developers, this is your prototyping dream—join the community via X.Amazon Kiro: Your Spec-Driven Coding BuddyI’ve already touched on Amazon’s Kiro, but let me reiterate—this spec-driven AI IDE is a standout. It structures your dev process around requirements, letting you define projects in plain language or diagrams before coding starts. It automates docs, testing, and more, feeling like a technical PM guiding you from concept to production. Early users are hooked on its PRD mode, and it’s free during preview. Give it a spin—details on X.Wrapping Up: An Unforgettable AI Birthday BashWhat a week, folks! From Kimi K2 redefining open-source power to OpenAI’s ChatGPT Agent ushering in a new era of task automation, this has been a whirlwind of innovation. Throw in Grok’s quirky waifus and our own W&B Inference launch, and I’m left speechless on my birthday. Sharing this with over 30,000 of you during our live stream was the ultimate gift—AI is moving at a pace I couldn’t have dreamed of when I started ThursdAI. Here’s to more breakthroughs, and I can’t wait to see what you build with Kimi K2 credits. Let’s keep pushing the boundaries together!P.S - If you'd like to support this podcast/newsletter and give me a birthday present, the best way is to tell your friends about it and the second best way is to subscribe 👏 TL;DR and Show NotesHere’s everything we covered this week on ThursdAI for July 17, 2025, packed with links and key highlights for you to dive deeper:* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed* Open Source LLMs* Moonshot launches Kimi K2 - a 1T param MoE crushing SWE Bench Verified at 65.8% (X post, HuggingFace, API & docs, GitHub)* Teknium drops Hermes 3 dataset - nearly 1M samples for training agentic models (X)* LGAI EXAONE-4.0 - hybrid attention, 32B & 1.2B models with 131K+ context (X, HuggingFace)* Big CO LLMs + APIs* OpenAI’s ChatGPT Agent - unified agentic AI for real-world tasks, scoring 41.6% on HLE (Announcement)* Grok 4 waifus - XAI adds animated characters, topping Japan’s App Store* Mira Murati’s Thinking Machines Lab - $2B funding for open AI science (X)* Gemini Embedding Model - #1 on MTEB with 68.32 score (X, Dev Blog)* Amazon S3 Vectors - preview for vector storage, up to 90% cost savings (X)* This Week’s Buzz* Kimi K2 on W&B Inference - open, scalable production access, $50 credits with “K2KoolAid” (X, Docs)* Wolfram’s Evaluation of W&B service (X)* Vision & Video* Runway’s Act-Two - next-gen motion capture for head, face, body, hands (X, YouTube)* Voice & Audio* Mistral’s Voxtral - open SOTA speech recognition, beats Whisper v3 (X, HuggingFace)* AI Art & Diffusion & 3D* OpenAI image service API adds high-quality mode (X)* Tools* Liquid AI’s LEAP & Apollo - on-device AI for mobile, privacy-first (X)* Amazon’s Kiro - spec-driven AI IDE, free in preview (X) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - Jul 10 - Grok 4 and 4 Heavy, SmolLM3, Liquid LFM2, Reka Flash & Vision, Perplexity Comet Browser, Devstral 1.1 & More AI News
7/11/2025
1:49:46
Hey everyone, Alex hereDon't you just love "new top LLM" drop weeks? I sure do! This week, we had a watch party for Grok-4, with over 20K tuning in to watch together, as the folks at XAI unveiled their newest and best model around. Two models in fact, Grok-4 and Grok-4 Heavy. We also had a very big open source week, we had the pleasure to chat with the creators of 3 open source models on the show, first with Elie from HuggingFace who just released SmoLM3, then with our friend Maxime Labonne who together with Liquid released a beautiful series of tiny on device models. Finally we had a chat with folks from Reka AI, and as they were on stage, someone in their org published a new open source Reka Flash model 👏 Talk about Breaking News right on the show! It was a very fun week and a great episode, so grab your favorite beverage and let me update you on everything that's going on in AI (as always, show notes at the end of the article) Open Source LLMsAs always, even on big weeks like this, we open the show with Open Source models first and this week, the western world caught up to the Chinese open source models we saw last week! HuggingFace SmolLM3 - SOTA fully open 3B with dual reasoning and long-context (𝕏, HF)We had Eli Bakouch from Hugging Face on the show and you could feel the pride radiating through the webcam. SmolLM 3 isn’t just “another tiny model”; it’s an 11-trillion-token monster masquerading inside a 3-billion-parameter body. It reasons, it follows instructions, and it does both “think step-by-step” and “give me the answer straight” on demand. Hugging Face open-sourced every checkpoint, every dataset recipe, every graph in W&B – so if you ever wanted a fully reproducible, multi-lingual pocket assistant that fits on a single GPU, this is it.They achieved the long context (128 K today, 256 K in internal tests) with a NoPE + YaRN recipe and salvaged the performance drop by literally merging two fine-tunes at 2 a.m. the night before release. Science by duct-tape, but it works: SmolLM 3 edges out Llama-3.2-3B, challenges Qwen-3, and stays within arm’s reach of Gemma-3-4B – all while loading faster than you can say “model soup.” 🤯Liquid AI’s LFM2: Blazing-Fast Models for the Edge (𝕏, Hugging Face)We started the show and I immediately got to hit the #BREAKINGNEWS button, as Liquid AI dropped LFM2, a new series of tiny (350M-1.2B) models focused on Edge devices.We then had the pleasure to host our friend Maxime Labonne, head of Post Training at Liquid AI, to come and tell us all about this incredible effort! Maxime, a legend in the model merging community, explained that LFM2 was designed from the ground up for efficiency. They’re not just scaled-down big models; they feature a novel hybrid architecture with convolution and attention layers specifically optimized for running on CPUs and devices like the Samsung Galaxy S24.Maxime pointed out that Out of the box, they won't replace ChatGPT, but when you fine-tune them for a specific task like translation, they can match models 60 times their size. This is a game-changer for creating powerful, specialized agents that run locally. Definitely a great release and on ThursdAI of all days! Mistrals updated Devstral 1.1 Smashes Coding Benchmarks (𝕏, HF)Mistral didn't want to be left behind on this Open Source bonanza week, and also, today, dropped an update to their excellent coding model Devstral. With 2 versions, an open weights Small and API-only Medium model, they have claimed an amazing 61.6% score on Swe Bench and the open source Small gets a SOTA 53%, the highest among the open source models! 10 points higher than the excellent DeepSwe we covered just last week!The thing to watch here is the incredible price performance, with this model beating Gemini 2.5 Pro and Claude 3.7 Sonnet while being 8x cheaper to run! DevStral small comes to us with an Apache 2.0 license, which we always welcome from the great folks at Mistral! Big Companies LLMs and APIsThere's only 1 winner this week, it seems that other foundational labs were very quiet to see what XAI is going to release. XAI releases Grok-4 and Grok-4 heavy - the world leading reasoning model (𝕏, Try It) Wow, what a show! Space uncle Elon together with the XAI crew, came fashionably late to their own stream, and unveiled the youngest but smartest brother of the Grok family, Grok 4 plus a multiple agents swarm they call Grok Heavy. We had a watch party with over 25K viewers across all streams who joined and watched together, this, fairly historic event! Why historic? Well, for one, they have scaled RL (Reinforcement Learning) for this model significantly more than any other lab did so far, which resulted in an incredible reasoner, able to solve HLE (Humanity's Last Exam) benchmark at an unprecedented 50% (while using tools) The other very much unprecedented result, is on the ArcAGI benchmark, specifically V2, which is designed to be very easy for humans and very hard for LLMs, Grok-4 got an incredible 15.9%, almost 2x better than Opus 4 the best performing model before it! (ArcAGI president Greg Kamradt says it Grok-4 shows signs of Fluid Intelligence!)Real World benchmarksOf course, academic benchmarks don't tell the full story, and while it's great to see that Grok-4 gets a perfect 100% on AIME25 and a very high 88.9% on GPQA Diamond, the most interesting benchmark they've showed was the Vending-Bench. This is a very interesting new benchmark from AndonLabs, where they simulate a vending machine, and let an LLM manage it, take orders, restock and basically count how much money a model can make while operating a "real" business. Grok scored a very significant $4K profit, selling 4569 items, 4x more than Opus, which shows a real impact on real world tasks! Not without controversyGrok-4 release comes just 1 day after Grok-3 over at X, started calling itself MechaHitler and started spewing Nazi Antisemitic propaganda, which was a very bad episode. We've covered the previous "misalignment" from Grok, and this seemed even worse. Many examples (which XAI folks deleted) or Grok talking about Antisemitic tropes, blaming people with Jewish surnames for multiple things and generally acting jailbroken and up to no good.Xai have addressed the last episode by a token excuse, supposedly open sourcing their prompts, which were updated all of 4 times in the last 2 month, while addressing this episode with a "we noticed, and we'll add guardrails to prevent this from happening" IMO this isn't enough, Grok is consistently (this is the 3rd time on my count) breaking alignment, way more than other foundational LLMs, and we must ask for more transparency for a model as significant and as widely used as this! And to my (lack of) surpriseFirst principles thinking == Elon's thoughts? Adding insult to injury, while Grok-4 was just launched, some folks asked it thoughts on the Israel-Palestine conflict and instead of coming up with an answer on its own, Grok-4 did a X search to see what Elon Musk things on this topic to form its opinion. It's so so wrong to claim a model is great at "first principles" and have the first few tests from folks, show that Grok defaults to see "what Elon thinks" Look, I'm all for "moving fast" and of course I love AI progress, but we need to ask more from the foundational labs, especially given the incredible amount of people who count on these models more and more! This weeks BuzzWe're well over 300 registrations to our hackathon at the Weights & Biases SF officess this weekend (July 12-13) and I'm packing my suitcase after writing this, as I'm excited to see all the amazing projets folks will build to try and win over $15K in prizes including an awesome ROBODOGNot to late to come and hack with us, register at lu.ma/weavehacks Tools – Browsers grow brainsPerplexity’s Comet landed on my Mac and within ten minutes it was triaging my LinkedIn invites by itself. This isn’t a Chrome extension; it’s a Chromium fork where natural-language commands are first-class citizens. Tell it “find my oldest unread Stripe invoice and download the PDF” and watch the mouse move. The Gmail connector lets you ask, “what flights do I still need to expense?” and get a draft report. Think Cursor, but for every tab.I benchmarked Comet against OpenAI Operator on my “scroll Alex’s 200 tweet bookmarks, extract the juicy links, drop them into Notion” task—Operator died halfway, Comet almost finished. Almost. The AI browser war has begun; Chrome’s Mariner project and OpenAI’s rumored Chromium team better move fast. Comet is available to Perplexity MAX subscribers now, and will come to pro subscribers with invites soon, as soon as I'll have them I'll tell you how to get one! Vision & VideoReka dropped in with a double-whammy of announcements. First, they showcased Reka Vision, an agentic platform that can search, analyze, and even edit your video library using natural language. The demo of it automatically generating short-form social media reels from long videos was super impressive.Then, in a surprise live reveal, they dropped Reka Flash 3.1, a new 21B parameter open-source multimodal model! It boasts great performance on coding and math benchmarks, including a 65% on AIME24. It was awesome to see them drop this right on the show.We also saw LTX Video release three new open-source LoRAs for precise video control (Pose, Depth, and Canny), and Moonvalley launched Marey, a video model for filmmakers that's built exclusively on licensed, commercially-safe data—a first for the industry.Veo3 making talking petsGoogle have released an update to VEO 3, allowing you to upload an image and have the characters in the image say what you want! It’s really cool for human like generations, but it’s way more fun to animate… your pets! Here’s two of the best doggos in Colorado presenting themselves! The full prompt to create your own after you upload an image was: Two dogs presenting themselves, the left one barking first and then saying "Hey, I'm George Washington Fox" and the right dog following up with a woof and then says "and I'm his younger brother, Dr Emmet Brown". Then both are saying "we're good boys" and barkingBoth should sound exiting with an american accent and a dog accentPhew, what a week! From open source Breaking News from the folks who trained the models right on the podcast, to watch parties and Nazi LLMs, this has been one hell of a ride! Next week, there are already rumors of a potential Gemini 3 release, the OpenAI open source model is rumored to be dropping, and I'm sure we'll get all kinds of incredible things lined up + it's going to be my birthday on Thursday so, looking forward! See you next week 🫡Show notes and LinksTL;DR of all topics covered:* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed) @ryancarson* Guests* Elie Bakouch - Training at Hugging Face (@eliebakouch)* Maxime Labonne - Head of postrainig and Liquid AI (@maximelabonne) author of LLM-Course* Mattia Atzeni - Member of Technical Staff @ Reka* Meenal Nalwaya - Head of Product, Reka Al* Open Source LLMs* HuggingFace - SmolLM3 SOTA, fully open-source 3B dual-mode reasoning and long-context support (X, HF)* Liquid AI launches LFM2: the fastest, most efficient open-source edge LLMs yet (X, HF)* Reachy Mini: Hugging Face and Pollen Robotics launch a $299 open-source desktop robot (X, HF)* NextCoder-32B: Microsoft’s new code-editing LLM rivals GPT-4o on complex code tasks (Microsoft Research, HF)* Mistral AI updates Devstral Small 1.1 and Devstral Medium, setting new open-source coding agent benchmarks (X, HF, Blog)* Reka updates RekaFlash 1.1 (HF)* Big CO LLMs + APIs* 👑 Grok 4 Release: A Historic Leap from XAI - Grok 4 and Grok 4 heavy X* Grok 3 is going nazi racing on X - MeinPrompt gate (X)* Gemini API Batch Mode launches with 50% cost savings for large-scale AI jobs (X, Google Blog)* This weeks Buzz* W&B Hackathon is nearing capacity - Robodog is ready to be given out (lu.ma/weavehacks)* Vision & Video* Reka Vision: Multimodal Agent for Visual Understanding and Search (Reka on X, Vision app)* LTX Video launches 3 open-source LoRAs for video control: Pose, Depth, Canny (LTX Studio on X, GitHub, HF model)* Marey by Moonvalley: the first professional, licensed AI video tool built for creative control (Moonvalley on X, Product page)* Tools* Perplexity Launches Comet: The AI-Powered Browser for Modern Productivity (X, HF) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - Jul 3 - ERNIE 4.5, Hunyuan A13B, MAI-DxO outperforms doctors, RL beats SWE bench, Zuck MSL hiring spree & more AI news
7/3/2025
1:36:16
Hey everyone, Alex here 👋Welcome back to another mind-blowing week on ThursdAI! We’re diving into the first show of the second half of 2025, and let me tell you, AI is not slowing down. This week, we’ve got a massive wave of open-source models from Chinese giants like Baidu and Tencent that are shaking up the game, Meta’s jaw-dropping hiring spree with Zuck assembling an AI dream team, and Microsoft’s medical AI outperforming doctors on the toughest cases. Plus, a real-time AI game engine that had me geeking out on stream. Buckle up, folks, because we’ve got a lot to unpack!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We had incredible guests like Michael Luo from Agentica, dropping knowledge on RL coding agents, and Ivan Burazin from Daytona, revealing the infrastructure powering the agent era. We had an incredible episode this week, with over 8,000 views for the live show (as always, Links and Show notes in the end, and the YT live video is here for your convienience if you'd prefer watching) Open Source AI & LLMs: The Chinese Powerhouse WaveMan, if there’s one takeaway from this week, it’s that Chinese companies are absolutely dominating the open-source LLM scene. Let’s break down the heavy hitters that dropped this week and why they’ve got everyone talking.Baidu’s ERNIE 4.5: A Suite of 10 Models to Rule Them AllBaidu, a giant in the Chinese tech space, just flipped the script by open-sourcing their ERNIE 4.5 series. We’re talking 10 distinct models ranging from a whopping 424 billion parameters down to a tiny 0.3 billion. With an Apache 2.0 license, 128K context window, and multimodal capabilities handling image, video, and text input, this is a massive drop. Their biggest Mixture-of-Experts (MoE) model, with 47B active parameters, even outshines OpenAI’s o1 on visual knowledge tasks like DocVQA, scoring 93% compared to o1’s 81%! What’s wild to me is Baidu’s shift. They’ve been running ERNIE in production for years—think chatbots and more across their ecosystem—but they weren’t always open-source fans. Now, they’re not just joining the party, they’re hosting it. If you’re into tinkering, this is your playground—check it out on Hugging Face (HF) or dive into their technical paper (Paper).Tencent’s Hunyuan-A13B-Instruct: WizardLM Team Strikes AgainNext up, Tencent dropped Hunyuan-A13B-Instruct, and oh boy, does it have a backstory. This 80B parameter MoE model (13B active at inference) comes from the legendary WizardLM team, poached from Microsoft after a messy saga where their killer models got yanked from the internet over “safety concerns.” I remember the frustration—we were all hyped, then bam, gone. Now, under Tencent’s wing, they’ve cooked up a model with a 256K context window, hybrid fast-and-slow reasoning modes, and benchmarks that rival DeepSeek R1 and OpenAI o1 on agentic tasks. It scores an impressive 87% on AIME 2024, though it dips to 76% on 2025, hinting at some overfitting quirks. Though for a 12B active parameters model this all is still VERY impressive.Here’s the catch—the license. It excludes commercial use in the EU, UK, and South Korea, and bans usage if you’ve got over 100M active users. So, not as open as we’d like, but for its size, it’s a beast that fits on a single machine, making it a practical choice for many. They’ve also released two datasets, ArtifactsBench and C3-Bench, for code and agent evaluation. I’m not sold on the name—Hunyuan doesn’t roll off the tongue for Western markets—but the WizardLM pedigree means it’s worth a look. Try it out on Hugging Face (HF) or test it directly (Try It).Huawei’s Pangu Pro MoE: Sidestepping Sanctions with Ascend NPUsHuawei entered the fray with Pangu Pro MoE, a 72B parameter model with 16B active per token, and here’s what got me hyped—it’s trained entirely on their own Ascend NPUs, not Nvidia or AMD hardware. This is a bold move to bypass US sanctions, using 4,000 of these chips to preprocess 13 trillion tokens. The result? Up to 1,528 tokens per second per card with speculative decoding, outpacing dense models in speed and cost-efficiency. Performance-wise, it’s close to DeepSeek and Qwen, making it a contender for those outside the Nvidia ecosystem.I’m intrigued by the geopolitical angle here. Huawei’s proving you don’t need Western tech to build frontier models, and while we don’t know who’s got access to these Ascend NPUs, it’s likely a game-changer for Chinese firms. Licensing isn’t as permissive as MIT or Apache, but it’s still open-weight. Peek at it on Hugging Face (HF) for more details.DeepSWE-Preview: RL Coding Agent Hits 59% on SWE-BenchSwitching gears, I was blown away chatting with Michael Luo from Agentica about DeepSWE-Preview, an open-source coding agent trained with reinforcement learning (RL) on Qwen3-32B. This thing scored a stellar 59% on SWE-Bench-Verified (42.2% Pass@1, 71% Pass@16), one of the top open-weight results out there. What’s cool is they did this without distilling from proprietary giants like Claude—just pure RL over six days on 64 H100 GPUs. Michael shared how RL is surging because pre-training hits data limits, and DeepSWE learned emergent behaviors like paranoia, double-checking edge cases to avoid shaky fixes.This underdog story of academic researchers breaking benchmarks with limited resources is inspiring. They’ve open-sourced everything—code, data, logs—making it a goldmine for the community. I’m rooting for them to get more compute to push past even higher scores. Dive into the details on their blog (Notion) or check the model on Hugging Face (HF Model).This Week’s Buzz from Weights & Biases: come Hack with Us! 🔥As always, I’ve got some exciting news from Weights & Biases to share. We’re hosting the first of our Weavehacks hackathons in San Francisco on July 12-13. It’s all about agent protocols like MCP and A2A, and I’m stoked to you guys in person—come say hi for a high-five! We’ve got cool prizes, including a custom W&B RoboDog that’s been a conference hit, plus $13-14K in cash. Spots are filling fast, so register now and we'll let you in (Sign Up).We’re also rolling out Online Evaluations in Weave, letting you monitor LLM apps live with judge agents on production data—super handy for catching hiccups. And our inference service via CoreWeave GPUs offers free credits for open-source model testing. Want in or curious about Weave’s tracing tools? Reach out to me anywhere, and I’ll hook you up. Can’t wait to demo this next week!Big Companies & APIs: AI’s NBA Draft and Medical MarvelsShifting to the big players, this week felt like an AI sports season with blockbuster hires and game-changing releases. From Meta’s talent poaching to Microsoft’s medical breakthroughs, let’s unpack the drama and innovation.Meta Superintelligence Labs: Zuck’s Dream Team Draft Imagine an AI NBA draft—that’s what Meta’s up to with their new Superintelligence Labs (MSL). Led by Alex Wang (formerly of Scale AI) and Nat Friedman (ex-GitHub CEO), MSL is Zuck’s power move after Llama 4’s lukewarm reception. They’ve poached up to 10 key researchers from OpenAI, including folks behind GPT-4’s image generation and o1’s foundations, with comp packages rumored at $100M for the first year and up to $300M over four years. That’s more than many Meta execs or even Tim Cook’s salary! They’ve also snagged talent from Google DeepMind and even tried to acquire Ilya Sutskever’s SSI outright (to which he said he's flattered but no) This is brute force at its finest, and I’m joking that I didn’t get a $100M offer myself—ThursdAI’s still waiting for that email, Zuck! OpenAI’s Sam Altman fired back with “missionaries beat mercenaries,” hinting at a culture clash, while Mark Chen felt like Meta “broke into their house and took something” It’s war, folks, and I’m hyped to see if MSL delivers a Llama that crushes it. With FAIR and GenAI folding under this new crack team of 50, plus Meta’s GPU arsenal, the stakes are sky-high.If you're like to see the list of "mercenaries" worth over 100M, you can see who they are and their achievements hereCursor’s Killer Hires and Web ExpansionSpeaking of talent wars, Cursor (built by AnySphere) just pulled off a stunner by hiring Boris Cherny and Cat Wu, key creators of Claude Code, as Chief Architect and Head of Product. This skyrockets Cursor’s cred in code generation, and I’m not surprised—Claude Code was a side project that exploded, and now Cursor’s got the brains behind it. On top of that, they’ve rolled out AI coding agents to web and mobile, even integrating with Slack. No more being tied to your desktop—launch, monitor, and collab on code tasks anywhere.The lines between native and web tools are blurring fast, and Cursor’s leading the charge. I haven’t tested the Slack bit yet, but if you have, hit me up in the comments. This, plus their recent $20M raise, shows they’re playing to win. Learn more at (Cursor).Microsoft MAI-DxO: AI Diagnoses Better Than DoctorsNow, onto something that hits close to home for me—Microsoft’s MAI-DxO, an AI system that’s outdiagnosing doctors on open-ended medical cases. On 304 of the toughest New England Journal of Medicine cases, it scored 85.5% accuracy, over four times the 20% rate of experienced physicians. I’ve had my share of frustrating medical waits, and seeing AI step in as a tool for doctors—not a replacement—gets me excited for the future.It’s an orchestration of models simulating a virtual clinician panel, asking follow-up questions, ordering tests, and even factoring in cost controls for diagnostics. This isn’t just acing multiple-choice; it handles real-world ambiguity. My co-host Yam and I stressed—don’t skip your doctor for ChatGPT, but expect your doc to be AI-superpowered soon. Read more on Microsoft’s blog (Blog).ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Cloudflare’s One-Click AI Bot Block: Protecting the InternetCloudflare made waves with a one-click feature to block AI bots and scrapers, available to all customers, even free-tier ones. With bots like Bytespider and GPTBot hitting nearly 40% of top sites, but only 3% blocking them, this addresses a huge shift. I’m with the CEO here—the old internet deal was Google scraping for traffic; now, AI summaries keep users from clicking through, breaking monetization for creators. Yam suggested a global license for training data with royalties, and I’m curious if that’s the future. For now, Cloudflare’s ML detects even sneaky bots spoofing as browsers. Big move—check their announcement (X) and the cool website goodaibots.com Cypher Alpha: Mystery 1M Context Model on OpenRouterLastly, a mysterious 1M context model, Cypher Alpha, popped up on OpenRouter for free testing (with data logging). It’s fast at 70 tokens/sec, low latency, but not a reasoning model—refusals on basic queries stumped me. Speculation points to Amazon Titan, which would be a surprise entry. I’m intrigued by who’s behind this—Gemini, OpenAI, and Qwen hit 1M context, but Amazon? Let’s see. Try it yourself (Link).Vision & Video: Mirage’s AI-Native Game Engine Blows Minds 🤯Okay, folks, I’ve gotta geek out here. Dynamics Lab unveiled the world’s first AI-native user-generated content (UGC) game engine, live with playable demos like a GTA-style “Urban Chaos” and a racing “Coastal Drift.” Running at 16 frames per second, it generates photorealistic worlds in real-time via natural language or controller input. You can jump, run, fight, or drive, and even upload an image to spawn a new game environment on the fly.What’s nuts is there’s no pre-built game behind this—it’s infinite, custom content created as you play. I was floored showing this on stream; it’s obviously not perfect with clipping and delays, but we’re witnessing the dawn of personalized gaming. You gotta try this—head to their site for the demos (Playable Demo).This brings us even more closer to the "every pixel will be generated" dream of Jensen Huang.Voice & Audio: TTS Gets Real with Kyutai and QwenThis week brought fresh text-to-speech (TTS) updates that hint at smarter conversational AI down the line. Kyutai TTS, from the French team behind Moshi, dropped with ultra-low latency (220ms first-token) and high speaker similarity (77.1% English, 78.7% French), plus a word error rate of just 2.82% in English. It’s production-ready with a Rust server and voice cloning from a 10-second clip—perfect for LLM-integrated apps. Check it out (X Announcement, HF Model).Qwen-TTS from Alibaba also launched, focusing on Chinese dialects like Pekingese and Shanghainese, but with English support too. It’s got human-level naturalness via API, though less relevant for our English audience. Still, it’s a solid step—see more (X Post). Both are pieces of the puzzle for richer virtual interactions, and I’m pumped to see where this goes.Infrastructure for Agents: Daytona’s Sandbox RevolutionI’m thrilled to have chatted with Ivan Burazin from Daytona, a cloud provider delivering agent-native runtimes—or sandboxes—that give agents their own computers for tasks like code execution or data analysis. They’ve hit over $1M in annualized run rate just two months post-launch, with 15,000 signups and 1,500 credit cards on file. That’s insane growth for infrastructure, which usually ramps slowly due to integration delays.Why’s this hot? 2025 is the year of agents, and as Ivan shared, even OpenAI and Anthropic recently redefined agents as needing runtimes. From YC’s latest batch (37% building agents) to Cursor’s web move, every task may soon spin up a sandbox. Daytona’s “stateful serverless” tech spins fast, lasts long, and scales across regions like the US, UK, Germany, and India, addressing latency and GDPR needs. If you’re building agents, this is your unsung hero—explore it at (Daytona IO) and grab $200 in credits, or up to $50K for startups (Startups).Wrapping Up: AI’s Relentless PaceWhat a week, folks! From Chinese open-source titans like ERNIE 4.5 and Hunyuan-A13B redefining accessibility, to Meta’s blockbuster hires signaling an AI arms race, and Microsoft’s MAI-DxO paving the way for smarter healthcare, we’re witnessing AI’s relentless acceleration. Mirage’s game engine and Daytona’s sandboxes remind us that creativity and infrastructure are just as critical as models themselves. I’m buzzing with anticipation for what’s next—will Meta’s dream team deliver? Will agents redefine every app? Stick with ThursdAI to find out. See you next week for more!TL;DR and Show NotesHere’s the quick rundown of everything we covered this week, packed with links to dive deeper:* Show Notes & Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed* Guests - Ivan Burazin (Daytona), Michael Luo (Agentica)* Open Source LLMs* Baidu’s ERNIE 4.5 Series - 10 models, 424B to 0.3B, multimodal, beats o1 on DocVQA (X, HF, Paper)* Tencent’s Hunyuan-A13B-Instruct - 80B total, 13B active, 256K context, WizardLM legacy (X, HF, Try It)* Huawei’s Pangu Pro MoE - 72B, trained on Ascend NPUs, 1,528 tokens/sec (X, HF)* DeepSWE-Preview - RL agent, 59% SWE-Bench-Verified on Qwen3-32B (Notion, HF Model)* This Week’s Buzz* Weights & Biases Weavehacks Hackathon - SF, July 12-13, agent protocols focus (Sign Up)* Big CO LLMs + APIs* Meta Superintelligence Labs (MSL) - Zuck hires dream team, up to $300M comp packages from OpenAI talent (list)* Cursor - Hires Claude Code creators, web/mobile agents with Slack (Cursor, HF)* Microsoft MAI-DxO - 85.5% accuracy on NEJM cases vs. 20% for doctors (X, Blog)* Cloudflare - One-click AI bot blocking, tackles scraping economics (X)* Cypher Alpha - Mystery 1M context model, possibly Amazon Titan (Link)* Gemini Pro 2.5 - Returned to Google’s free tier* Vision & Video* Mirage - AI-native UGC game engine, real-time photorealistic demos (Playable Demo)* Workflow - Restyle videos with Flux Kontext and Luma Modify (X)* Voice & Audio* Kyutai TTS - Low-latency, high similarity in EN/FR (X, HF)* Qwen-TTS - Bilingual Chinese/English, human-level naturalness (X, HF)* Infrastructure* Daytona - Agent-native sandboxes, $1M run rate in 2 months (GitHub, Startups)* Tools* Chai Discovery’s Chai-2 - Zero-shot antibody design (Chai Discovery)Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and I’ll catch you next week for more! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news
6/26/2025
1:39:39
Hey folks, Alex here, writing from... a undisclosed tropical paradise location 🏝️ I'm on vacation, but the AI news doesn't stop of course, and neither does ThursdAI. So huge shoutout to Wolfram Ravenwlf for running the show this week, Nisten, LDJ and Yam who joined. So... no long blogpost with analysis this week, but I'll def. recommend tuning in to the show that the folks ran, they had a few guests on, and even got some breaking news (new Flux Kontext that's open source) Of course many of you are readers and are here for the links, so I'm including the raw TL;DR + speaker notes as prepared by the folks for the show! P.S - our (rescheduled) hackathon is coming up in San Francisco, on July 12-13 called WeaveHacks, if you're interested at a chance to win a RoboDog, welcome to join us and give it a try. Register HEREOk, that's it for this week, please enjoy the show and see you next week! ThursdAI - June 26th, 2025 - TL;DR* Hosts and Guests* WolframRvnwlf - Host (@WolframRvnwlf)* Co-Hosts - @yampeleg, @nisten, @ldjconfirmed* Guest - Jason Kneen (@jasonkneen) - Discussing MCPs, coding tools, and agents* Guest - Hrishioa (@hrishioa) - Discussing agentic coding and spec-driven development* Open Source LLMs* Mistral Small 3.2 released with improved instruction following, reduced repetition & better function calling (X)* Unsloth AI releases dynamic GGUFs with fixed chat templates (X)* Kimi-VL-A3B-Thinking-2506 multimodal model updated for better video reasoning and higher resolution (Blog)* Chinese Academy of Science releases Stream-Omni, a new Any-to-Any model for unified multimodal input (HF, Paper)* Prime Intellect launches SYNTHETIC-2, an open reasoning dataset and synthetic data generation platform (X)* Big CO LLMs + APIs* Google* Gemini CLI, a new open-source AI agent, brings Gemini 2.5 Pro to your terminal (Blog, GitHub)* Google reduces free tier API limits for previous generation Gemini Flash models (X)* Search Live with voice conversation is now rolling out in AI Mode in the US (Blog, X)* Gemini API is now faster for video and PDF processing with improved caching (Docs)* Anthropic* Claude introduces an "artifacts" space for building, hosting, and sharing AI-powered apps (X)* Federal judge rules Anthropic's use of books for training Claude qualifies as fair use (X)* xAI* Elon Musk announces the successful launch of Tesla's Robotaxi (X)* Microsoft* Introduces Mu, a new language model powering the agent in Windows Settings (Blog)* Meta* Report: Meta pursued acquiring Ilya Sutskever's SSI, now hires co-founders Nat Friedman and Daniel Gross (X)* OpenAI* OpenAI removes mentions of its acquisition of Jony Ive's startup 'io' amid a trademark dispute (X)* OpenAI announces the release of DeepResearch in API + Webhook support (X)* This weeks Buzz* Alex is on vacation; WolframRvnwlf is attending AI Tinkerers Munich on July 25 (Event)* Join W&B Hackathon happening in 2 weeks in San Francisco - grand prize is a RoboDog! (Register for Free)* Vision & Video* MeiGen-MultiTalk code and checkpoints for multi-person talking head generation are released (GitHub, HF)* Google releases VideoPrism for generating adaptable video embeddings for various tasks (HF, Paper, GitHub)* Voice & Audio* ElevenLabs launches 11.ai, a voice-first personal assistant with MCP support (Sign Up, X)* Google Magenta releases Magenta RealTime, an open weights model for real-time music generation (Colab, Blog)* ElevenLabs launches a mobile app for iOS and Android for on-the-go voice generation (X)* AI Art & Diffusion & 3D* Google rolls out Imagen 4 and Imagen 4 Ultra in the Gemini API and Google AI Studio (Blog)* OmniGen 2 open weights model for enhanced image generation and editing is released (Project Page, Demo, Paper)* Tools* OpenMemory Chrome Extension provides shared memory across ChatGPT, Claude, Gemini and more (X)* LM Studio adds MCP support to connect local LLMs with your favorite servers (Blog)* Cursor is now available as a Slack integration (Dashboard)* All Hands AI releases the OpenHands CLI, a model-agnostic, open-source coding agent (Blog, Docs)* Warp 2.0 launches as an Agentic Development Environment with multi-threading (X)* Studies and Others* The /r/LocalLLaMA subreddit is back online after a brief moderation issue (Reddit, News)* Andrej Karpathy's talk "Software 3.0" discusses the future of programming in the age of AI (YouTube, Summary)Thank you, see you next week! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news
6/20/2025
1:41:31
Hey all, Alex here 👋This week, while not the busiest week in releases (we can't get a SOTA LLM every week now can we), was full of interesting open source releases, and feature updates such as the chatGPT meetings recorder (which we live tested on the show, the limit is 2 hours!)It was also a day after our annual W&B conference called FullyConnected, and so I had a few goodies to share with you, like answering the main question, when will W&B have some use of those GPUs from CoreWeave, the answer is... now! (We launched a brand new preview of an inference service with open source models)And finally, we had a great chat with Pankaj Gupta, co-founder and CEO of Yupp, a new service that lets users chat with the top AIs for free, while turning their votes into leaderboards for everyone else to understand which Gen AI model is best for which task/topic. It was a great conversation, and he even shared an invite code with all of us (I'll attach to the TL;DR and show notes, let's dive in!)00:00 Introduction and Welcome01:04 Show Overview and Audience Interaction01:49 Special Guest Announcement and Experiment03:05 Wolfram's Background and Upcoming Hosting04:42 TLDR: This Week's Highlights15:38 Open Source AI Releases32:34 Big Companies and APIs32:45 Google's Gemini Updates42:25 OpenAI's Latest Features54:30 Exciting Updates from Weights & Biases56:42 Introduction to Weights & Biases Inference Service57:41 Exploring the New Inference Playground58:44 User Questions and Model Recommendations59:44 Deep Dive into Model Evaluations01:05:55 Announcing Online Evaluations via Weave01:09:05 Introducing Pankaj Gupta from YUP.AI01:10:23 YUP.AI: A New Platform for Model Evaluations01:13:05 Discussion on Crowdsourced Evaluations01:27:11 New Developments in Video Models01:36:23 OpenAI's New Transcription Service01:39:48 Show Wrap-Up and Future PlansHere's the TL;DR and show notes linksThursdAI - June 19th, 2025 - TL;DR* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed* Guest - @pankaj - co-founder of Yupp.ai* Open Source LLMs* Moonshot AI open-sourced Kimi-Dev-72B (Github, HF)* MiniMax-M1 456B (45B Active) - reasoning model (Paper, HF, Try It, Github)* Big CO LLMs + APIs* Google drops Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview ( Blog, Tech report, Tweet)* Google launches Search Live: Talk, listen and explore in real time with AI Mode (Blog)* OpenAI adds MCP support to Deep Research in chatGPT (X, Docs)* OpenAI launches their meetings recorder in mac App (docs)* Zuck update: Considering bringing Nat Friedman and Daniel Gross to Meta (information)* This weeks Buzz* NEW! W&B Inference provides a unified interface to access and run top open-source AI models (inference, docs)* NEW! W&B Weave Online Evaluations delivers real-time production insights and continuous evaluation for AI agents across any cloud. (X)* The new platform offers "metal-to-token" observability, linking hardware performance directly to application-level metrics.* Vision & Video* ByteDance new video model beats VEO3 - Seedance.1.0 mini (Site, FAL)* MiniMax Hailuo 02 - 1080p native, SOTA instruction following (X, FAL)* Midjourney video is also here - great visuals (X)* Voice & Audio* Kyutai launches open-source, high-throughput streaming Speech-To-Text models for real-time applications (X, website)* Studies and Others* LLMs Flunk Real-World Coding Contests, Exposing a Major Skill Gap (Arxiv)* MIT Study: ChatGPT Use Causes Sharp Cognitive Decline (Arxiv)* Andrej Karpathy's "Software 3.0": The Dawn of English as a Programming Language (youtube, deck)* Tools* Yupp launches with 500+ AI models, a new leaderboard, and a user-powered feedback economy - use thursdai link* to get 50% extra credits* BrowserBase announces director.ai - an agent to run things on the web* Universal system prompt for reduction of hallucination (from Reddit)*Disclosure: while this isn't a paid promotion, I do think that yupp has a great value, I do get a bit more credits on their platform if you click my link and so do you. You can go to yupp.ai and register with no affiliation if you wish. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - June 12 - Meta’s $15B ScaleAI Power Play, OpenAI’s o3-pro & 90% Price Drop!
6/13/2025
1:33:10
Hey folks, this is Alex, finally back home! This week was full of crazy AI news, both model related but also shifts in the AI landscape and big companies, with Zuck going all in on scale & execu-hiring Alex Wang for a crazy $14B dollars. OpenAI meanwhile, maybe received a new shipment of GPUs? Otherwise, it’s hard to explain how they have dropped the o3 price by 80%, while also shipping o3-pro (in chat and API). Apple was also featured in today’s episode, but more so for the lack of AI news, completely delaying the “very personalized private Siri powered by Apple Intelligence” during WWDC25 this week. We had 2 guests on the show this week, Stefania Druga and Eric Provencher (who builds RepoPrompt). Stefania helped me cover the AI Engineer conference we all went to last week, and shared some cool Science CoPilot stuff she’s working on, while Eric is the GOTO guy for O3-pro helped us understand what this model is great for! As always, TL;DR and show notes at the bottom, video for those who prefer watching is attached below, let’s dive in! Big Companies LLMs & APIsLet’s start with big companies, because the landscape has shifted, new top reasoner models dropped and some huge companies didn’t deliver this week! Zuck goes all in on SuperIntelligence - Meta’s $14B stake in ScaleAI and Alex WangThis may be the most consequential piece of AI news today. Fresh from the dissapointing results of LLama 4, reports of top researchers leaving the Llama team, many have decided to exclude Meta from the AI race. We have a saying at ThursdAI, don’t bet against Zuck! Zuck decided to spend a lot of money (nearly 20% of their reported $65B investment in AI infrastructure) to get a 49% stake in Scale AI and bring Alex Wang it’s (now former) CEO to lead the new Superintelligence team at Meta. For folks who are not familiar with Scale, it’s a massive company in providing human annotated data services to all the big AI labs, Google, OpenAI, Microsoft, Anthropic.. all of them really. Alex Wang, is the youngest self made billionaire because of it, and now Zuck not only has access to all their expertise, but also to a very impressive AI persona, who could help revive the excitement about Meta’s AI efforts, help recruit the best researchers, and lead the way inside Meta. Wang is also an outspoken China hawk who spends as much time in congressional hearings as in Slack, so the geopolitics here are … spicy. Meta just stapled itself to the biggest annotation funnel on Earth, hired away Google’s Jack Rae (who was on the pod just last week, shipping for Google!) for brainy model alignment, and started waving seven-to-nine-figure comp packages at every researcher with “Transformer” in their citation list. Whatever disappointment you felt over Llama-4’s muted debut, Zuck clearly felt it too—and responded like a founder who still controls every voting share. OpenAI’s Game-Changer: o3 Price Slash & o3-pro launches to top the intelligence leaderboards!Meanwhile OpenAI dropping not one, but two mind-blowing updates. First, they’ve slashed the price of o3—their premium reasoning model—by a staggering 80%. We’re talking from $40/$10 per million tokens down to just $8/$2. That’s right, folks, it’s now in the same league as Claude Sonnet cost-wise, making top-tier intelligence dirt cheap. I remember when a price drop of 80% after a year got us excited; now it’s 80% in just four months with zero quality loss. They’ve confirmed it’s the full o3 model—no distillation or quantization here. How are they pulling this off? I’m guessing someone got a shipment of shiny new H200s from Jensen!And just when you thought it couldn’t get better, OpenAI rolled out o3-pro, their highest intelligence offering yet. Available for pro and team accounts, and via API (87% cheaper than o1-pro, by the way), this model—or consortium of models—is a beast. It’s topping charts on Artificial Analysis, barely edging out Gemini 2.5 as the new king. Benchmarks are insane: 93% on AIME 2024 (state-of-the-art territory), 84% on GPQA Diamond, and nearing a 3000 ELO score on competition coding. Human preference tests show 64-66% of folks prefer o3-pro for clarity and comprehensiveness across tasks like scientific analysis and personal writing.I’ve been playing with it myself, and the way o3-pro handles long context and tough problems is unreal. As my friend Eric Provencher (creator of RepoPrompt) shared on the show, it’s surgical—perfect for big refactors and bug diagnosis in coding. It’s got all the tools o3 has—web search, image analysis, memory personalization—and you can run it in background mode via API for async tasks. Sure, it’s slower due to deep reasoning (no streaming thought tokens), but the consistency and depth? Worth it. Oh, and funny story—I was prepping a talk for Hamel Hussain’s evals course, with a slide saying “don’t use large reasoning models if budget’s tight.” The day before, this price drop hits, and I’m scrambling to update everything. That’s AI pace for ya!Apple WWDC: Where’s the Smarter Siri? Oh Apple. Sweet, sweet Apple. Remember all those Bella Ramsey ads promising a personalized Siri that knows everything about you? Well, Craig Federighi opened WWDC by basically saying "Yeah, about that smart Siri... she's not coming. Don't wait up."Instead, we got:* AI that can combine emojis (revolutionary! 🙄)* Live translation (actually cool)* Direct API access to on-device models (very cool for developers)* Liquid glass UI (pretty but... where's the intelligence?)The kicker? Apple released a paper called "The Illusion of Thinking" right before WWDC, basically arguing that AI reasoning models hit hard complexity ceilings. Some saw this as Apple making excuses for why they can't ship competitive AI. The timing was... interesting.During our recording, Nisten's Siri literally woke up randomly when we were complaining about how dumb it still is. After a decade, it's the same Siri. That moment was pure comedy gold.This Week's BuzzOur premium conference Fully Connected is happening June 17-18 in San Francisco! Use promo code WBTHURSAI to register for free. We'll have updates on the CoreWeave acquisition, product announcements, and it's the perfect chance to give feedback directly to the people building the tools you use.Also, my talk on Large Reasoning Models as LLM judges is now up on YouTube. Had to update it live because of the O3 price drop - such is life in AI!Open Source LLMs: Mistral Goes Reasoning ModeMistral Drops Magistral - Their First Reasoning ModelThe French champagne of LLMs is back! Mistral released Magistral, their first reasoning model, in two flavors: a 24B parameter open-source Small version and a closed API-only Medium version. And honestly? The naming continues to be chef's kiss - Mistral really has the branding game locked down.Now, here's where it gets spicy. Mistral's benchmarks notably don't include comparisons to Chinese models like Qwen or DeepSeek. Dylan Patel from SemiAnalysis called them out on this, and when he ran the comparisons himself, well... let's just say Magistral Medium barely keeps up with Qwen's tiny 4B parameter model on math benchmarks. Ouch.But here's the thing - and Nisten really drove this home during our discussion - benchmarks don't tell the whole story. He's been using Magistral Small for his workflows and swears by it. "It's almost at the point where I don't want to tell people about it," he said, which is the highest praise from someone who runs models locally all day. The 24B Small version apparently hits that sweet spot for local deployment while being genuinely useful for real work.The model runs on a single RTX 4090 or a 32GB MacBook after quantization, has a 128K context window (though they recommend capping at 40K), and uses a transparent mode that shows its reasoning process. It's Apache 2.0 licensed, multilingual, and available through their Le Chat interface with "Flash Answers" for real-time reasoning.SakanaAI's Text2Lora: The Future is Self-Adapting ModelsThis one blew my mind. SakanaAI (co-founded by one of the Transformer paper authors) released Text2Lora - a method for adapting LLMs to new tasks using ONLY text descriptions. No training data needed!Think about this: instead of fine-tuning a model with thousands of examples to make it better at math, you just... tell it to be better at math. And it works! On Llama 3.1 8B, Text2Lora reaches 77% average accuracy, outperforming all baseline methods.What this means is we're approaching a world where models can essentially customize themselves on-the-fly for whatever task you throw at them. As Nisten put it, "This is revolutionary. The model is actually learning, actually changing its own weights." We're just seeing the first glimpses of this capability, but in 6-12 months? 🎥 Multimedia & Tools: Video, Voice, and Browser BreakthroughsLet’s zip through some multimedia and tool updates that caught my eye this week. Google’s VEO3-fast is a creator’s dream—2x faster 720p video generation, 80% cheaper, and now with audio support. I’ve seen clips on social media (like an NBA ad) that are unreal, though Wolfram noted it’s not fully rolled out in Europe yet. You can access it via APIs like Fail or Replicate, and I’m itching to make a full movie if I had the budget!Midjourney’s gearing up for a video product with their signature style, but they’re also facing heat—Disney and Universal are suing them for copyright infringement over Star Wars and Avengers-like outputs. It’s Hollywood’s first major strike against AI, and while I get the IP concern, it’s odd they picked the smaller player when OpenAI and Google are out there too. This lawsuit could drag on, so stay tuned.OpenAI’s new advanced voice mode dropped, aiming for a natural cadence with better multilingual support (Russian and Hebrew sound great now). But honestly? I’m not loving the breathing and laughing they added—it’s uncanny valley for me. Some folks on X are raving, though, and LDJ noted it’s closing the gap to Sesame’s Maya. I just wish they’d let me pick between old and new voices instead of switching under my feet. If OpenAI’s listening, transparency please!On the tools side, Yutori’s Scouts got my timeline buzzing—AI agents that monitor the web for any topic (like “next ThursdAI release”) and notify you of updates. I saw a demo catching leadership changes at xAI, and it’s the future of web interaction. Couldn’t log in live on the show (email login woes—give me passwords, folks!), but it’s beta on yutori.com. Also, Browser Company finally launched DIA, an AI-native browser in beta. Chatting with open tabs, rewriting text, and instant answers? I’ve been using it to prep for ThursdAI, and it’s pretty slick. Try it at diabrowser.com.Wrapping Up: AI’s Breakneck PaceWhat a week, folks! From OpenAI democratizing intelligence with o3-pro and price cuts to Meta’s bold superintelligence play with ScaleAI, we’re witnessing history unfold at lightning speed. Apple’s stumble at WWDC stings, but open-source gems and new tools keep the excitement alive. I’m still riding the high from AI Engineer last week—your high-fives and feedback mean the world. Next week, don’t miss Weights & Biases’ Fully Connected conference in SF on June 18-19. I won’t be there physically, but I’m cheering from afar—grab your spot at fullyconnected.com with promo code WBTHURSAI for a sweet deal.Thanks for being part of the ThursdAI crew. Here’s the full TL;DR and show notes to catch anything you missed. See you next week!TL;DR of all topics covered:* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed* Guests - * Stefania Druga @stefania_druga (Independent, Former Research Scientist Google DeepMind),Creator of scratch copilot, and AI Engineer education summit. * Eric Provencher - @pvncher (Building RepoPrompt)* Chit Chat - AI Engineer conference vibes, meeting fans, Jack Rae’s move to Meta.* Open Source LLMs* Mistral Magistral - 24B reasoning model (X, HF, Blog)* HuggingFace Screensuite - GUI agents evaluation framework (HF)* SakanaAI Text2Lora - Instant, Task-Specific LLM Adaptation (Github)* Big CO LLMs + APIs* OpenAI drops o3 price by 90% (Blog)* OpenAI launches o3-pro - highest intelligence model (X)* Meta buys 49% stake in ScaleAI, Alex Wang heads superintelligence team (Blog, Axios)* Apple WWDC updates - pause on Apple Intelligence in iOS26, live translation, on-device APIs* Apple paper on reasoning as illusion (Paper, Rebuttal)* This Week’s Buzz* Fully Connected: W&B’s 2-day conference, June 17-18 in SF (fullyconnected.com) - Promo Code WBTHURSAI* Alex’s talk on LRM as LLM judges on Hamel’s course (YT)* Vision & Video* VEO3-fast - 2x faster 720p generations, 80% cheaper* Midjourney to launch video product (X)* Topaz Astra - creative 4K video upscaler (X, Site)* Voice & Audio* OpenAI’s new advanced voice mode - mixed responses, better multilingual support* Cartesia Ink-Whisper - optimized for real-time chat (Blog)* AI Art & Diffusion & 3D* Disney & Universal sue Midjourney - first Hollywood vs AI lawsuit (NBC)* Krea releases KREA-1 - custom image gen model (X)* AI Tools* Yutori Scouts - AI agents for web monitoring (Blog)* BrowserCompany DIA - AI-native browser in beta (Link) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs
6/6/2025
1:43:45
Hey folks, this is Alex, coming to you LIVE from the AI Engineer Worlds Fair! What an incredible episode this week, we recorded live from floor 30th at the Marriott in SF, while Yam was doing live correspondence from the floor of the AI Engineer event, all while Swyx, the cohost of Latent Space podcast, and the creator of AI Engineer (both the conference and the concept itself) joined us for the whole stream - here’s the edited version, please take a look. We've had around 6500 people tune in, and at some point we got 2 surprise guests, straight from the keynote stage, Logan Kilpatrick (PM for AI Studio and lead cheerleader for Gemini) and Jack Rae (principal scientist working on reasoning) joined us for a great chat about Gemini! Mind was absolutely blown! They have just launched the new Gemini 2.5 Pro and I though it would only be fitting to let their new model cover this podcast this week (so below is fully AI generated ... non slop I hope). The show notes and TL;DR is as always in the end. Okay, enough preamble… let's dive into the madness!🤯 Google Day at AI Engineer: New Gemini 2.5 Pro and a Look Inside the Machine's MindFor the first year of this podcast, a recurring theme was us asking, "Where's Google?" Well, it's safe to say that question has been answered with a firehose of innovation. We were lucky enough to be joined by Google DeepMind's Logan Kilpatrick and Jack Rae, the tech lead for "thinking" within Gemini, literally moments after they left the main stage.Surprise! A New Gemini 2.5 Pro Drops LiveLogan kicked things off with a bang, officially announcing a brand new, updated Gemini 2.5 Pro model right there during his keynote. He called it "hopefully the final update to 2.5 Pro," and it comes with a bunch of performance increases, closing the gap on feedback from previous versions and hitting SOTA on benchmarks like Aider.It's clear that the organizational shift to bring the research and product teams together under the DeepMind umbrella is paying massive dividends. Logan pointed out that Google has seen a 50x increase in AI inference over the past year. The flywheel is spinning, and it's spinning fast.How Gemini "Thinks"Then things got even more interesting. Jack Rae gave us an incredible deep dive into what "thinking" actually means for a language model. This was one of the most insightful parts of the conference for me.For years, the bottleneck for LLMs has been test-time compute. Models were trained to respond immediately, applying a fixed amount of computation to go from a prompt to an answer, no matter how hard the question. The only way to get a "smarter" response was to use a bigger model.Jack explained that "Thinking" shatters this limitation. Mechanically, Gemini now has a "thinking stage" where it can generate its own internal text—hypothesizing, testing, correcting, and reasoning—before committing to a final answer. It's an iterative loop of computation that the model can dynamically control, using more compute for harder problems. It learns how to think using reinforcement learning, getting a simple "correct" or "incorrect" signal and backpropagating that to shape its reasoning strategies.We're already seeing the results of this. Jack showed a clear trend: as models get better at reasoning, they're also using more test-time compute. This paradigm also gives developers a "thinking budget" slider in the API for Gemini 2.5 Flash and Pro, allowing a continuous trade-off between cost and performance.The future of this is even wilder. They're working on DeepThink, a high-budget mode for extremely hard problems that uses much deeper, parallel chains of thought. On the tough USA Math Olympiad, where the SOTA was negligible in January, 2.5 Pro reached the 50th percentile of human participants. DeepThink pushes that to the 65th percentile.Jack’s ultimate vision is inspired by the mathematician Ramanujan, who derived incredible theorems from a single textbook by just thinking deeply. The goal is for models to do the same—contemplate a small set of knowledge so deeply that they can push the frontiers of human understanding. Absolutely mind-bending stuff.🤖 MorphLabs and the Audacious Quest for Verified SuperintelligenceJust when I thought my mind couldn't be bent any further, we were joined by Jesse Han, the founder and CEO of MorphLabs. Fresh off his keynote, he laid out one of the most ambitious visions I've heard: building the infrastructure for the Singularity and developing "verified superintelligence."The big news was that Christian Szegedy is joining MorphLabs as Chief Scientist. For those who don't know, Christian is a legend—he invented batch norm and adversarial examples, co-founded XAI, and led code reasoning for Grok. That's a serious hire.Jesse’s talk was framed around a fascinating question: "What does it mean to have empathy for the machine?" He argues that as AI develops personhood, we need to think about what it wants. And what it wants, according to Morph, is a new kind of cloud infrastructure.This is MorphCloud, built on a new virtualization stack called Infinibranch. Here’s the key unlock: it allows agents to instantaneously snapshot, branch, and replicate their entire VM state. Imagine an agent reaching a decision point. Instead of choosing one path, it can branch its entire existence—all its processes, memory, and state—to explore every option in parallel. It can create save states, roll back to previous checkpoints, and even merge its work back together.This is a monumental step for agentic AI. It moves beyond agents that are just a series of API calls to agents that are truly embodied in complex software environments. It unlocks the potential for recursive self-improvement and large-scale reinforcement learning in a way that's currently impossible. It’s a bold, sci-fi vision, but they're building the infrastructure to make it a reality today.🔥 The Agent Conversation: OpenAI, MCP, and Magic MomentsThe undeniable buzz on the conference floor was all about agents. You couldn't walk ten feet without hearing someone talking about agents, tools, and MCP.OpenAI is leaning in here too. This week, they made their Codex coding agent available to all ChatGPT Plus users and announced that ChatGPT will soon be able to listen in on your Zoom meetings. This is all part of a broader push to make AI more active and integrated into our workflows.The MCP (Model-Context-Protocol) track at the conference was packed, with lines going down the hall. (Alex here, I had a blast talking during that track about MCP observability, you can catch our talk here on the live stream of AI Engineer) Logan Kilpatrick offered a grounded perspective, suggesting the hype might be a bit overblown but acknowledging the critical need for an open standard for tool use, a void left when OpenAI didn't formalize ChatML.I have to share my own jaw-dropping MCP moment from this week. I was coding an agent using an IDE that supports MCP. My agent, which was trying to debug itself, used an MCP tool to check its own observability traces on the Weights & Biases platform. While doing so, it discovered a new tool that our team had just added to the MCP server—a support bot. Without any prompting from me, my coding agent formulated a question, "chatted" with the support agent to get the answer, came back, fixed its own code, and then re-checked its work. Agent-to-agent communication, happening automatically to solve a problem. My jaw was on the floor. That's the magic of open standards.This Week's Buzz from Weights & BiasesSpeaking of verification and agents, the buzz from our side is all about it! At our booth here at AI Engineer, we have a Robodog running around, connected to our LLM evaluation platform, W&B Weave. As Jesse from MorphLabs discussed, verifying what these complex agentic systems are doing is critical. Whether it's superintelligence or your production application, you need to be able to evaluate, trace, and understand its behavior. We're building the tools to do just that.And if you're in San Francisco, don't forget our own conference, Fully Connected, is happening on June 18th and 19th! It's going to be another amazing gathering of builders and researchers. Fullyconnected.com get in FREE with the promo code WBTHURSAIWhat a show. The energy, the announcements, the sheer brainpower in one place was something to behold. We’re at a point where the conversation has shifted from theory to practice, from hype to real, tangible engineering. The tracks on agents and enterprise adoption were overflowing because people are building, right now. It was an honor and a privilege to bring this special episode to you all.Thank you for tuning in. We'll be back to our regular programming next week! (and Alex will be back to writing his own newsletter, not send direct AI output!)AI News TL;DR and show notes* Hosts and Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co Hosts - @swyx @yampeleg @romechenko * Guests - @officialLoganK, @jack_w_rae* Open Source LLMs * ByteDance / ContentV-8B - (HF)* Big CO LLMs + APIs* Gemini Pro 2.5 updated Jun 5th (X)* SOTA on HLE, Aider, and GPQA* Now supports thinking budgets* Same cost, on pareto frontier* Closes gap on 03-25 regressions* OAI AVM injects ads and stopped singing (X)* OpenAI Codex is now available to plus members and has internet access (X)* ~24,000 NEW PRs overnight from Codex after @OpenAI expands access to free users.* OpenAI will record meetings and released connectors like (X)* TestingCatalog News 🗞@testingcatalogJun 4, 2025OpenAI released loads of connectors for Team accounts! Most of these connectors can be used for Deep Research, while Google Drive, SharePoint, Dropbox and Box could be used in all chats. https://t.co/oBEmYGKguE* Anthropic cuts windsurf access for Windsurf (X)* Without warning, Anthropic cuts off Windsurf from official Claude 3 and 4 APIs* This weeks Buzz* FULLY - CONNECTED - Fully Connected: W&B's 2-day conference, June 18-19 in SF fullyconnected.com - Promo Code WBTHURSAI* Vision & Video* VEO3 is now available via API on FAL (X)* Captions launches Mirage Studio - talking avatars competition to HeyGen/Hedra (X)* Voice & Audio* ElevenLabs model V3 - supports emotion tags and is "inflection point" (X) * Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers].* Tools* Cursor Launched V1 - Bug Bot reviews PRs, iPython notebooks and one clickMCP* 24,000 NEW PRs overnight from Codex after @OpenAI expands access to plus users (X) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
📆 ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news
5/29/2025
1:28:18
Hey everyone, Alex here 👋Welcome back to another absolutely wild week in AI! I'm coming to you live from the Fontainebleau Hotel in Vegas at the Imagine AI conference, and wow, what a perfect setting to discuss how AI is literally reimagining our world. After last week's absolute explosion of releases (Claude Opus 4, Google I/O madness, OpenAI Codex and Jony colab), this week gave us a chance to breathe... sort of. Because even in a "quiet" week, we still got a new DeepSeek model that's pushing boundaries, and the entire internet discovered that we might all just be prompts. Yeah, it's been that kind of week!Before we dive in, quick shoutout to everyone who joined us live - we had some technical hiccups with the Twitter Spaces audio (sorry about that!), but the YouTube stream was fire. And speaking of fire, we had two incredible guests join us: Charlie Holtz from Chorus (the multi-model chat app that's changing how we interact with AI) and Linus Eckenstam, who's been traveling the AI conference circuit and bringing us insights from the frontlines of the generative AI revolution.Open Source AI & LLMs: DeepSeek Whales & Mind-Bending PapersDeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance. We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6. It’s edging closer to heavyweights like o3, and folks on X are already calling it “clearer thinking.” There was hype it might’ve been R2, but the impact didn’t quite crash the stock exchange like past releases. Still, it’s likely among the best open-weight models out there.So what's new? Early reports and some of my own poking around suggest this model "thinks clearer now." Nisten mentioned that while previous DeepSeek models sometimes liked to "vibe around" and explore the latent space before settling on an answer, this one feels a bit more direct.And here’s the kicker—they also released an 8B distilled version based on Qwen3, runnable on your laptop. Yam called it potentially the best 8B model to date, and you can try it on Ollama right now. No need for a monster rig! The Mind-Bending "Learning to Reason Without External Rewards" PaperOkay, this paper result broke my brain, and apparently everyone else's too. This paper shows that models can improve through reinforcement learning with its own intuition of whether or not it's correct. 😮It's like the placebo effect for AI! The researchers trained models without telling them what was good or bad, but rather, utilized a new framework called Intuitor, where the reward was based on how the "self certainty". The thing that took my whole timeline by storm is, it works! GRPO (Group Policy Optimization) - the framework that DeepSeek gave to the world with R1 is based on external rewards (human optimize) and Intuitor seems to be mathcing or even exceeding some of GRPO results when Qwen2.5 3B was used to finetune. Incredible incredible stuffBig Companies LLMs & APIsClaude Opus 4: A Week Later – The Dev Darling?Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark. Charlie Holtz, who's building Chorus (more on that amazing app in a bit!), shared that while it's sometimes "astrology" to judge the vibes of a new model, Opus 4 feels like a step change, especially in coding. He mentioned that Claude Code, powered by Opus 4 (and Sonnet 4 for implementation), is now tackling GitHub issues that were too complex just weeks ago. He even had a coworker who "vibe coded three websites in a weekend" with it – that's a tangible productivity boost!Linus Eckenstam highlighted how Lovable.dev saw their syntax error rates plummet by nearly 50% after integrating Claude 4. That’s quantifiable proof of improvement! It's clear Anthropic is leaning heavily into the developer/coding space. Claude Opus is now #1 on the LMArena WebDev arena, further cementing its reputation.I had my own magical moment with Opus 4 this week. I was working on an MCP observability talk for the AI Engineer conference and trying to integrate Weave (our observability and evals framework at Weights & Biases) into a project. Using Windsurf's Cascade agent (which now lets you bring your own Opus 4 key, by the way – good move, Windsurf!), Opus 4 not only tried to implement Weave into my agent but, when it got stuck, it figured out it had access to the Weights & Biases support bot via our MCP tool. It then formulated a question to the support bot (which is also AI-powered!), got an answer, and used that to fix the implementation. It then went back and checked if the Weave trace appeared in the dashboard! Agents talking to agents to solve a problem, all while I just watched – my jaw was on the floor. Absolutely mind-blowing.Quick Hits: Voice Updates from OpenAI & AnthropicOpenAI’s Advanced Voice Mode finally sings—yes, I’ve been waiting for this! It can belt out tunes like Mariah Carey, which is just fun. Anthropic also rolled out voice mode on mobile, keeping up in the conversational race. Both are cool steps, but I’m more hyped for what’s next in voice AI—stay tuned below (OpenAI X, Anthropic X).🐝 This Week's Buzz: Weights & Biases Updates!Alright, time for a quick update from the world of Weights & Biases!* Fully Connected is Coming! Our flagship 2-day conference, Fully Connected, is happening on June 18th and 19th in San Francisco. It's going to be packed with amazing speakers and insights into the world of AI development. You can still grab tickets, and as a ThursdAI listener, use the promo code WBTHURSAI for a 100% off ticket! I hustled to get yall this discount! (Register here)* AI Engineer World's Fair Next Week! I'm super excited for the AI Engineer conference in San Francisco next week. Yam Peleg and I will be there, and we're planning another live ThursdAI show from the event! If you want to join the livestream or snag a last-minute ticket, use the coupon code THANKSTHURSDAI for 30% off (Get it HERE)Vision & Video: Reality is Optional NowVEO3 and the Prompt Theory PhenomenonGoogle's VEO3 has completely taken over TikTok with the "Prompt Theory" videos. If you haven't seen these yet, stop reading and watch ☝️. The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.The technical achievement here is staggering. We're not just talking about good visuals - VEO3 nails temporal consistency, character emotions, situational awareness (characters look at whoever's speaking), perfect lip sync, and contextually appropriate sound effects. Linus made a profound point - if not for the audio, VEO3 might not have been as explosive. The combination of visuals AND audio together is what's making people question reality. We're seeing people post actual human videos claiming they're AI-generated because the uncanny valley has been crossed so thoroughly.Odyssey's Interactive Worlds: The Holodeck PrototypeOdyssey dropped their interactive video demo, and folks... we're literally walking through AI-generated worlds in real-time. This isn't a game engine rendering 3D models - this is a world model generating each frame as you move through it with WASD controls.Yes, it's blurry. Yes, I got stuck in a doorway. But remember Will Smith eating spaghetti from two years ago? The pace of progress is absolutely insane. As Linus pointed out, we're at the "GAN era" of world models. Combine VEO3's quality with Odyssey's interactivity, and we're looking at completely personalized, infinite entertainment experiences.The implications that Yam laid out still have me shook - imagine Netflix shows completely customized to you, with your context and preferences, generated on the fly. Not just choosing from a catalog, but creating entirely new content just for you. We're not ready for this, but it's coming fast.Hunyuan's Open Source Avatar RevolutionWhile the big companies are keeping their video models closed, Tencent dropped two incredible open source releases: HunyuanPortrait and HunyuanAvatar. These are legitimate competitors to Hedra and HeyGen, but completely open source.HunyuanPortrait does high-fidelity portrait animation from a single image plus video. HunyuanAvatar goes further with 1 image + audio, and lipsync, body animation, multi-character support, and emotion control. Wolfram tested these extensively and confirmed they're "state of the art for open source." The portrait model is basically perfect for deepfakes (use responsibly, people!), while the avatar model opens up possibilities for AI assistants with consistent visual presence.🖼️ AI Art & DiffusionBlack Forest Labs drops Flux Kontext - SOTA image editing! This came as massive breaking news during the show (thought we didn't catch it live!) - Black Forest Labs, creators of Flux, dropped an incredible Image Editing model called Kontext (really, 3 models, Pro, Max and 12B open source Dev in private preview). The are consistent, context aware text and image editing! Just see the below exampleIf you used GPT-image to Ghiblify yourself, or VEO, you know that those are not image editing models, your face will look different every generation. These images model keep you consistent, while adding what you wanted. This character consistency is something many folks really want and it's great to see Flux innovating and bringing us SOTA again and are absolutely crushing GPT-image in instruction following, character preservation and style reference!Maybe the most important thing about this model is the increible speed. While the Ghiblification chatGPT trend took the world by storm, GPT images are SLOW! Check out the speed comparisons on Kontext! You can play around with these models on the new Flux Playground, but they also already integrated into FAL, FreePik, Replicate, Krea and tons of other services! 🎙️ Voice & Audio: Everyone Gets a VoiceUnmute.sh: Any LLM Can Now TalkKyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM. The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs. just taking a breath).What's brilliant about this approach is it preserves all the capabilities of the underlying text model while adding natural voice interaction. No more choosing between smart models and voice-enabled models - now you can have both!It's going to be open sourced at some point soon, and while awesome, Unmute did have some instability in how the voice sounds! It answered to me with 1 type of voice and then during the same conversation, answered with another, you can give it a tru yourself at unmute.sh Chatterbox: Open Source Voice Agents for EveryoneResemble AI open sourced Chatterbox, featuring zero-shot voice cloning from just 5 seconds of audio and unique emotion intensity control. Playing with the demo where they could dial up the emotion from 0.5 to 2.0 on the same text was wild - from calm to absolutely unhinged Samuel L. Jackson energy.This being a .5B param model is great, The issue I always have, is that with my fairly unique accent, these models sound like a British Alex all the time, and I just don't talk like that! Though the fact that this runs locally and includes safety features (profanity filters, content classifiers and something called PerTh watermarking) while being completely open source is exactly what the ecosystem needs. We're rapidly approaching a world where anyone can build sophisticated voice agents.👏Looking Forward: The Convergence is RealAs we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities. We have LLMs getting better at reasoning (even with random rewards!), video models breaking reality, voice models becoming indistinguishable from humans, and it's all happening simultaneously.Charlie's comment that "we are the prompts" might have been said in jest, but it touches on something profound. As these models get better at generating realistic worlds, characters, and voices, the line between generated and real continues to blur. The Prompt Theory videos aren't just entertainment - they're a mirror reflecting our anxieties about AI and consciousness.But here's what keeps me optimistic: the open source community is keeping pace. DeepSeek, Hunyuan, ResembleAI, and others are ensuring that these capabilities don't remain locked behind corporate walls. The democratization of AI continues, even as the capabilities become almost magical.Next week, I'll be at AI Engineer World's Fair in San Francisco, finally meeting Yam face-to-face and bringing you all the latest from the biggest AI engineering conference of the year. Until then, keep experimenting, keep building, and remember - in this exponential age, today's breakthrough is tomorrow's baseline.Stay curious, stay building, and I'll see you next ThursdAI! 🚀Show Notes & TL;DR LinksShow Notes & Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf (@WolframRvnwlf), @yampeleg (@yampeleg) @nisten (@nisten)* Guests - Charlie Holtz (@charliebholtz]), Linus Eckenstam (@LinusEkenstam @LinusEkenstam)* Open Source LLMs* DeepSeek-R1-0528 - Updated reasoning model with AIME 91, LiveCodeBench 73 (Try It)* Learning to Reason Without External Rewards - Paper on random rewards improving models (X)* HaizeLabs j1-nano & j1-micro - Tiny reward models (600M, 1.7B params), RewardBench 80.7% for micro (Tweet, GitHub, HF-micro, HF-nano)* Big CO LLMs + APIs* Claude Opus 4 - #1 on LMArena WebDev, coding step change (X)* Mistral Agents API - Framework for custom tool-using agents (Blog, Tweet)* Mistral Embed SOTA - New state-of-the-art embedding API (X)* OpenAI Advanced Voice Mode - Now sings with new capabilities (X)* Anthropic Voice Mode - Released on mobile for conversational AI (X)* This Week’s Buzz* Fully Connected - W&B conference, June 18-19, SF, promo code WBTHURSAI (Register)* AI Engineer World’s Fair - Next week in SF, 30% off with THANKSTHURSDAI (Register)* AI Art & Diffusion* BFL Flux Kontext - SOTA image editing model for identity-consistent edits (Tweet, Announcement)* Vision & Video* VEO3 Prompt Theory - Viral AI video trend questioning reality on TikTok (X)* Odyssey Interactive Video - Real-time AI world exploration at 30 FPS (Blog, Try It)* HunyuanPortrait - High-fidelity portrait video from one photo (Site, Paper)* HunyuanVideo-Avatar - Audio-driven full-body avatar animation (Site, Tweet)* Voice & Audio* Unmute.sh - KyutAI’s voice wrapper for any LLM, low latency, soon open-source (Try It, X)* Chatterbox - Resemble AI’s open-source voice cloning with emotion control (GitHub, HF)* Tools* Opera NEON - Agent-centric AI browser for autonomous web tasks (Site, Tweet) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

More Episodes

Get the whole world of podcasts with the free GetPodcast app.

Subscribe to your favorite podcasts, listen to episodes offline and get thrilling recommendations.

More episodes from "ThursdAI - The top AI news from the past week"

📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon?

📆 ThursdAI - July 24, 2025 - Qwen-mas in July, The White House's AI Action Plan & Math Olympiad Gold for AIs + coding a 3d tetris on stream

📆 ThursdAI - July 17th - Kimi K2 👑, OpenAI Agents, Grok Waifus, Amazon Kiro, W&B Inference & more AI news!

📆 ThursdAI - Jul 10 - Grok 4 and 4 Heavy, SmolLM3, Liquid LFM2, Reka Flash & Vision, Perplexity Comet Browser, Devstral 1.1 & More AI News

📆 ThursdAI - Jul 3 - ERNIE 4.5, Hunyuan A13B, MAI-DxO outperforms doctors, RL beats SWE bench, Zuck MSL hiring spree & more AI news

📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news

📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news

📆 ThursdAI - June 12 - Meta’s $15B ScaleAI Power Play, OpenAI’s o3-pro & 90% Price Drop!

📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs

📆 ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news