ThursdAI - The top AI news from the past week podcast

ThursdAI - Apr 3rd - OpenAI Goes Open?! Gemini Crushes Math, AI Actors Go Hollywood & MCP, Now with Observability?

0:00
1:37:33
15 Sekunden vorwärts
15 Sekunden vorwärts

Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are one show away from hitting the big 100, which is just wild to me. And speaking of milestones, we just crossed 100,000 downloads on Substack alone! [Insert celebratory sound effect here 🎉]. Honestly, knowing so many of you tune in every week genuinely fills me with joy, but also a real commitment to keep bringing you the the high-signal, zero-fluff AI news you count on. Thank you for being part of this amazing community! 🙏

And what a week it's been! I started out busy at work, playing with the native image generation in ChatGPT like everyone else (all 130 million of us!), and then I looked at my notes for today… an absolute mountain of updates. Seriously, one of those weeks where open source just exploded, big companies dropped major news, and the vision/video space is producing stuff that's crossing the uncanny valley.

We’ve got OpenAI teasing a big open source release (yes, OpenAI might actually be open again!), Gemini 2.5 showing superhuman math skills, Amazon stepping into the agent ring, truly mind-blowing AI character generation from Meta, and a personal update on making the Model Context Protocol (MCP) observable. Plus, we had some fantastic guests join us live!

So buckle up, grab your coffee (or whatever gets you through the AI whirlwind), because we have a lot to cover. Let's dive in! (as always, show notes and links in the end)

OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised

It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles.

First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"! Kevin Weil tweeted that they're working on a "highly capable open language model" and are actively seeking developer feedback through dedicated sessions (sign up here if interested) to "get this right." Word on the street is that this could be a powerful reasoning model. Sam Altman also cheekily added they won't slap on a Llama-style >

Open Source Powerhouses: Nomic & OpenHands Deliver SOTA

Beyond the OpenAI buzz, the open source community delivered some absolute gems, and we had guests from two key projects join us!

Nomic Embed Multimodal: SOTA Embeddings for Visual Docs

Our friends at Nomic AI are back with a killer release! We had Zach Nussbaum on the show discussing Nomic Embed Multimodal. These are new 3B & 7B parameter embedding models (available on Hugging Face) built on Alibaba's excellent Qwen2.5-VL. They achieved SOTA on visual document retrieval by cleverly embedding interleaved text-image sequences – perfect for PDFs and complex webpages.

Zach highlighted that they chose the Qwen base because high-performing open VLMs under 3B params are still scarce, making it a solid foundation. Importantly, the 7B model comes with an Apache 2.0 license, and they've open sourced weights, code, and data. They offer both a powerful multi-vector version (ColNomic) and a faster single-vector one. Huge congrats to Nomic!

OpenHands LM 32B & Agent: Accessible SOTA Coding

Remember OpenDevin? It evolved into OpenHands, and the team just dropped their own OpenHands LM 32B! We chatted with co-founder Xingyao "Elle" Wang about this impressive Qwen 2.5 finetune (MIT licensed, on Hugging Face).

It hits a remarkable 37.2% on SWE-Bench Verified (a coding benchmark measuring real-world repo tasks), competing with much larger models. Elle stressed they didn't just chase code completion scores; they focused on tuning for agentic capabilities – tool use, planning, self-correction – using trajectories from their contamination-free Switch Execution dataset. This focus seems to be paying off, as the OpenHands agent also snagged the #2 spot on the brand new Live SWE-Bench leaderboard! Plus, the 32B model runs locally on a single 3090, making this power accessible. You can also try their managed OpenHands Cloud ($50 free credit). Amazing progress from this team!

Frontiers: Diffusion LMs & Superhuman Math

Two other developments pushed the boundaries this week:

Dream 7B: A Diffusion Language Model Challenger?

This one's fascinating conceptually. Researchers unveiled Dream 7B, a language model based on diffusion, not auto-regression. The benchmarks they shared show it competing strongly with top 7-8B models, and absolutely crushing tasks like Sudoku (81% vs trong>, so we can't verify or play with it. Still, one to watch!

Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)

We already knew Gemini 2.5 was good, but wow. New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%. Gemini 2.5 Pro scored an incredible 24.4%!

The gap between it and everything else is massive, highlighting the power of its reasoning and thinking capabilities (which you can inspect via its traces!). Having used it for complex tasks myself (like wrestling with tax forms!), I can attest to its depth. It's free in the Gemini app – go try it!

Agents, Compute & Making MCP Observable

Amazon's Nova Act Agent & The Need for Access

Amazon entered the agent chat with Nova Act, designed for web browser actions. They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent. But... it's only available via an SDK with a request form. As Yam rightly pointed out on the show, these agent claims mean little until we can actually use them in the real world!

CoreWeave + NVIDIA = Insane Speeds

Hardware keeps accelerating. CoreWeave announced hitting 800 Tokens/sec on Llama 3.1 405B using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s. Inference is getting fast.

This Week's Buzz: Let's Make MCP Observable!

Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum. MCP is potentially the "HTTP for agents," enabling tool interoperability. But as tool use moves external, we lose visibility, making debugging and security harder.

That's why I'm launching the Observable Tools initiative. The goal: integrate observability into the MCP standard itself. Right now, that link redirects to a GitHub discussion where I've proposed using the OpenTelemetry (OTel) standard to add tracing to MCP interactions. This would give developers clear visibility into tool usage, regardless of their observability platform.

I need your help! Please check out the proposal, join the discussion, and show your support with a 👍 or 🚀 on GitHub. We need the community voice to make this happen! (And yes, my viral tweet showed there's huge demand for usable MCP clients too – more on that soon!).

Vision & Video: Entering the Uncanny Valley

This space is moving at lightning speed.

Runway Gen-4 was announced, pushing for better consistency in AI video. Here's a few example videos showing incredible character and world consistency:

ByteDance's impressive OmniHuman (single image to talking avatar) is now publicly usable via Dreamina website. For people it's really good, but for animated style images, Hedra Labs feels actually better (and much much faster)

Meta's MoCHA is mind-blowing. We had researcher Cong Wei explain how it generates movie-grade, full-body, expressive talking characters directly from speech and text (no reference image needed!). Using Diffusion Transformers and clever attention mechanisms, the realism is startling, handling lip-sync, gestures, emotions, and even multi-character dialogue. Check the project page videos – some are truly uncanny. Just look at this one!

Voice Highlight: Hailuo Speech-02

While Gladia launched their Solaria STT, the standout for me was Hailuo's Speech-02 TTS API. The emotional control and voice cloning quality are, in my opinion, potentially SOTA right now, offering incredibly nuanced and realistic synthetic voices.

Tool Update & Breaking News!

* Google's NotebookLM now discovers related sources automatically.

* BREAKING NEWS (Caught end of show): Devin 2.0 is out! Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price. Much more accessible to try!

Phew! What a week. From OpenAI's big moves to Gemini's math prowess, stunning AI actors from Meta, and the push for an observable agent ecosystem – the field is accelerating like crazy.

Alright folks, that’s a wrap for show #99! Thank you again for tuning in, for being part of the community, and for keeping us on our toes with your insights and feedback. Special thanks to our guests Zach Nussbaum (Nomic), Xingyao Wang (All Hands AI), and Cong Wei (Meta/MoCHA) for joining us!

If you missed any part of the show, or want to grab any of the links, head over to ThursdAI.news. The full recording (video on YouTube, audio on Spotify, Apple Podcasts, etc.) and this blog post with all the notes will be up shortly.

The best way to support the show? Share it with a friend or colleague who needs to stay up-to-date on AI, and drop us a 5-star review on your podcast platform! Financial support via Substack is also appreciated but never required.

Get ready for Episode 100 next week! Until then, happy tinkering, stay curious, and I'll see you next ThursdAI!

Bye bye everyone!

TL;DR and Show Notes

Host, Guests, and Co-hosts

* Host: Alex Volkov - AI Evangelist & Weights & Biases (@altryne)

* Co-Hosts:

* LDJ (@ldjconfirmed)

* Yam Peleg (@yampeleg)

* Guests:

* Zach Nussbaum (@zach_nussbaum) - Nomic AI

* Xingyao Wang (@xingyaow_) - All Hands AI / OpenHands

* Cong Wei (@CongWei1230) - Meta AI / MoCha

Key Topics & Links

* OpenAI's Big Week:

* Teasing highly capable Open Source Reasoner Model (seeking feedback).

* Released PaperBench eval (code, paper) & Nano-Eval framework.

* Raised $40B at $300B valuation.

* New EMO "Monday" voice in ChatGPT.

* Open Source Powerhouses:

* Nomic Embed Multimodal: SOTA visual doc embeddings (3B & 7B, Apache 2.0 for 7B).

* OpenHands LM 32B: SOTA-level coding agent model (Qwen finetune, MIT License, 37.2% SWE-Bench, #2 Live SWE-Bench). Cloud version available.

* Frontier Models & Capabilities:

* Dream 7B: Promising diffusion LM shows strong benchmark results (esp. Sudoku), but weights not yet released.

* Gemini 2.5: Crushes hard USAMO math eval (24.4% vs >

* Agents & Compute:

* Amazon's Nova Act agent announced, claims SOTA but lacks public access (request form).

* CoreWeave/NVIDIA: Massive inference speedups (800T/s on Llama 405B with GB200).

* This Week's Buzz - MCP:

* Observable Tools initiative launched to add observability to MCP.

* Proposal using OpenTelemetry posted for community feedback on GitHub - please support!

* Huge demand shown for usable MCP clients (viral tweet).

* Vision & Video Highlights:

* Runway Gen-4 focuses on video consistency.

* ByteDance OmniHuman (image-to-avatar) now publicly available via Dreamina (example thread).

* Meta's MoCHA: Generates stunningly realistic, movie-grade talking characters from speech+text.

* Voice Highlight:

* Hailuo Speech-02: Impressive TTS API with excellent emotional control and voice cloning.

* Tool Updates:

* Windsurf adds deployments to Netlify.

* Google NotebookLM adds source discovery.

* Breaking News:

* Devin 2.0 AI Software Engineer announced, starts at $20/month.



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

Weitere Episoden von „ThursdAI - The top AI news from the past week“