đ ThursdAI - Jul 3 - ERNIE 4.5, Hunyuan A13B, MAI-DxO outperforms doctors, RL beats SWE bench, Zuck MSL hiring spree & more AI news
Hey everyone, Alex here đWelcome back to another mind-blowing week on ThursdAI! Weâre diving into the first show of the second half of 2025, and let me tell you, AI is not slowing down. This week, weâve got a massive wave of open-source models from Chinese giants like Baidu and Tencent that are shaking up the game, Metaâs jaw-dropping hiring spree with Zuck assembling an AI dream team, and Microsoftâs medical AI outperforming doctors on the toughest cases. Plus, a real-time AI game engine that had me geeking out on stream. Buckle up, folks, because weâve got a lot to unpack!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We had incredible guests like Michael Luo from Agentica, dropping knowledge on RL coding agents, and Ivan Burazin from Daytona, revealing the infrastructure powering the agent era. We had an incredible episode this week, with over 8,000 views for the live show (as always, Links and Show notes in the end, and the YT live video is here for your convienience if you'd prefer watching) Open Source AI & LLMs: The Chinese Powerhouse WaveMan, if thereâs one takeaway from this week, itâs that Chinese companies are absolutely dominating the open-source LLM scene. Letâs break down the heavy hitters that dropped this week and why theyâve got everyone talking.Baiduâs ERNIE 4.5: A Suite of 10 Models to Rule Them AllBaidu, a giant in the Chinese tech space, just flipped the script by open-sourcing their ERNIE 4.5 series. Weâre talking 10 distinct models ranging from a whopping 424 billion parameters down to a tiny 0.3 billion. With an Apache 2.0 license, 128K context window, and multimodal capabilities handling image, video, and text input, this is a massive drop. Their biggest Mixture-of-Experts (MoE) model, with 47B active parameters, even outshines OpenAIâs o1 on visual knowledge tasks like DocVQA, scoring 93% compared to o1âs 81%! Whatâs wild to me is Baiduâs shift. Theyâve been running ERNIE in production for yearsâthink chatbots and more across their ecosystemâbut they werenât always open-source fans. Now, theyâre not just joining the party, theyâre hosting it. If youâre into tinkering, this is your playgroundâcheck it out on Hugging Face (HF) or dive into their technical paper (Paper).Tencentâs Hunyuan-A13B-Instruct: WizardLM Team Strikes AgainNext up, Tencent dropped Hunyuan-A13B-Instruct, and oh boy, does it have a backstory. This 80B parameter MoE model (13B active at inference) comes from the legendary WizardLM team, poached from Microsoft after a messy saga where their killer models got yanked from the internet over âsafety concerns.â I remember the frustrationâwe were all hyped, then bam, gone. Now, under Tencentâs wing, theyâve cooked up a model with a 256K context window, hybrid fast-and-slow reasoning modes, and benchmarks that rival DeepSeek R1 and OpenAI o1 on agentic tasks. It scores an impressive 87% on AIME 2024, though it dips to 76% on 2025, hinting at some overfitting quirks. Though for a 12B active parameters model this all is still VERY impressive.Hereâs the catchâthe license. It excludes commercial use in the EU, UK, and South Korea, and bans usage if youâve got over 100M active users. So, not as open as weâd like, but for its size, itâs a beast that fits on a single machine, making it a practical choice for many. Theyâve also released two datasets, ArtifactsBench and C3-Bench, for code and agent evaluation. Iâm not sold on the nameâHunyuan doesnât roll off the tongue for Western marketsâbut the WizardLM pedigree means itâs worth a look. Try it out on Hugging Face (HF) or test it directly (Try It).Huaweiâs Pangu Pro MoE: Sidestepping Sanctions with Ascend NPUsHuawei entered the fray with Pangu Pro MoE, a 72B parameter model with 16B active per token, and hereâs what got me hypedâitâs trained entirely on their own Ascend NPUs, not Nvidia or AMD hardware. This is a bold move to bypass US sanctions, using 4,000 of these chips to preprocess 13 trillion tokens. The result? Up to 1,528 tokens per second per card with speculative decoding, outpacing dense models in speed and cost-efficiency. Performance-wise, itâs close to DeepSeek and Qwen, making it a contender for those outside the Nvidia ecosystem.Iâm intrigued by the geopolitical angle here. Huaweiâs proving you donât need Western tech to build frontier models, and while we donât know whoâs got access to these Ascend NPUs, itâs likely a game-changer for Chinese firms. Licensing isnât as permissive as MIT or Apache, but itâs still open-weight. Peek at it on Hugging Face (HF) for more details.DeepSWE-Preview: RL Coding Agent Hits 59% on SWE-BenchSwitching gears, I was blown away chatting with Michael Luo from Agentica about DeepSWE-Preview, an open-source coding agent trained with reinforcement learning (RL) on Qwen3-32B. This thing scored a stellar 59% on SWE-Bench-Verified (42.2% Pass@1, 71% Pass@16), one of the top open-weight results out there. Whatâs cool is they did this without distilling from proprietary giants like Claudeâjust pure RL over six days on 64 H100 GPUs. Michael shared how RL is surging because pre-training hits data limits, and DeepSWE learned emergent behaviors like paranoia, double-checking edge cases to avoid shaky fixes.This underdog story of academic researchers breaking benchmarks with limited resources is inspiring. Theyâve open-sourced everythingâcode, data, logsâmaking it a goldmine for the community. Iâm rooting for them to get more compute to push past even higher scores. Dive into the details on their blog (Notion) or check the model on Hugging Face (HF Model).This Weekâs Buzz from Weights & Biases: come Hack with Us! đ„As always, Iâve got some exciting news from Weights & Biases to share. Weâre hosting the first of our Weavehacks hackathons in San Francisco on July 12-13. Itâs all about agent protocols like MCP and A2A, and Iâm stoked to you guys in personâcome say hi for a high-five! Weâve got cool prizes, including a custom W&B RoboDog thatâs been a conference hit, plus $13-14K in cash. Spots are filling fast, so register now and we'll let you in (Sign Up).Weâre also rolling out Online Evaluations in Weave, letting you monitor LLM apps live with judge agents on production dataâsuper handy for catching hiccups. And our inference service via CoreWeave GPUs offers free credits for open-source model testing. Want in or curious about Weaveâs tracing tools? Reach out to me anywhere, and Iâll hook you up. Canât wait to demo this next week!Big Companies & APIs: AIâs NBA Draft and Medical MarvelsShifting to the big players, this week felt like an AI sports season with blockbuster hires and game-changing releases. From Metaâs talent poaching to Microsoftâs medical breakthroughs, letâs unpack the drama and innovation.Meta Superintelligence Labs: Zuckâs Dream Team Draft Imagine an AI NBA draftâthatâs what Metaâs up to with their new Superintelligence Labs (MSL). Led by Alex Wang (formerly of Scale AI) and Nat Friedman (ex-GitHub CEO), MSL is Zuckâs power move after Llama 4âs lukewarm reception. Theyâve poached up to 10 key researchers from OpenAI, including folks behind GPT-4âs image generation and o1âs foundations, with comp packages rumored at $100M for the first year and up to $300M over four years. Thatâs more than many Meta execs or even Tim Cookâs salary! Theyâve also snagged talent from Google DeepMind and even tried to acquire Ilya Sutskeverâs SSI outright (to which he said he's flattered but no) This is brute force at its finest, and Iâm joking that I didnât get a $100M offer myselfâThursdAIâs still waiting for that email, Zuck! OpenAIâs Sam Altman fired back with âmissionaries beat mercenaries,â hinting at a culture clash, while Mark Chen felt like Meta âbroke into their house and took somethingâ Itâs war, folks, and Iâm hyped to see if MSL delivers a Llama that crushes it. With FAIR and GenAI folding under this new crack team of 50, plus Metaâs GPU arsenal, the stakes are sky-high.If you're like to see the list of "mercenaries" worth over 100M, you can see who they are and their achievements hereCursorâs Killer Hires and Web ExpansionSpeaking of talent wars, Cursor (built by AnySphere) just pulled off a stunner by hiring Boris Cherny and Cat Wu, key creators of Claude Code, as Chief Architect and Head of Product. This skyrockets Cursorâs cred in code generation, and Iâm not surprisedâClaude Code was a side project that exploded, and now Cursorâs got the brains behind it. On top of that, theyâve rolled out AI coding agents to web and mobile, even integrating with Slack. No more being tied to your desktopâlaunch, monitor, and collab on code tasks anywhere.The lines between native and web tools are blurring fast, and Cursorâs leading the charge. I havenât tested the Slack bit yet, but if you have, hit me up in the comments. This, plus their recent $20M raise, shows theyâre playing to win. Learn more at (Cursor).Microsoft MAI-DxO: AI Diagnoses Better Than DoctorsNow, onto something that hits close to home for meâMicrosoftâs MAI-DxO, an AI system thatâs outdiagnosing doctors on open-ended medical cases. On 304 of the toughest New England Journal of Medicine cases, it scored 85.5% accuracy, over four times the 20% rate of experienced physicians. Iâve had my share of frustrating medical waits, and seeing AI step in as a tool for doctorsânot a replacementâgets me excited for the future.Itâs an orchestration of models simulating a virtual clinician panel, asking follow-up questions, ordering tests, and even factoring in cost controls for diagnostics. This isnât just acing multiple-choice; it handles real-world ambiguity. My co-host Yam and I stressedâdonât skip your doctor for ChatGPT, but expect your doc to be AI-superpowered soon. Read more on Microsoftâs blog (Blog).ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Cloudflareâs One-Click AI Bot Block: Protecting the InternetCloudflare made waves with a one-click feature to block AI bots and scrapers, available to all customers, even free-tier ones. With bots like Bytespider and GPTBot hitting nearly 40% of top sites, but only 3% blocking them, this addresses a huge shift. Iâm with the CEO hereâthe old internet deal was Google scraping for traffic; now, AI summaries keep users from clicking through, breaking monetization for creators. Yam suggested a global license for training data with royalties, and Iâm curious if thatâs the future. For now, Cloudflareâs ML detects even sneaky bots spoofing as browsers. Big moveâcheck their announcement (X) and the cool website goodaibots.com Cypher Alpha: Mystery 1M Context Model on OpenRouterLastly, a mysterious 1M context model, Cypher Alpha, popped up on OpenRouter for free testing (with data logging). Itâs fast at 70 tokens/sec, low latency, but not a reasoning modelârefusals on basic queries stumped me. Speculation points to Amazon Titan, which would be a surprise entry. Iâm intrigued by whoâs behind thisâGemini, OpenAI, and Qwen hit 1M context, but Amazon? Letâs see. Try it yourself (Link).Vision & Video: Mirageâs AI-Native Game Engine Blows Minds đ€ŻOkay, folks, Iâve gotta geek out here. Dynamics Lab unveiled the worldâs first AI-native user-generated content (UGC) game engine, live with playable demos like a GTA-style âUrban Chaosâ and a racing âCoastal Drift.â Running at 16 frames per second, it generates photorealistic worlds in real-time via natural language or controller input. You can jump, run, fight, or drive, and even upload an image to spawn a new game environment on the fly.Whatâs nuts is thereâs no pre-built game behind thisâitâs infinite, custom content created as you play. I was floored showing this on stream; itâs obviously not perfect with clipping and delays, but weâre witnessing the dawn of personalized gaming. You gotta try thisâhead to their site for the demos (Playable Demo).This brings us even more closer to the "every pixel will be generated" dream of Jensen Huang.Voice & Audio: TTS Gets Real with Kyutai and QwenThis week brought fresh text-to-speech (TTS) updates that hint at smarter conversational AI down the line. Kyutai TTS, from the French team behind Moshi, dropped with ultra-low latency (220ms first-token) and high speaker similarity (77.1% English, 78.7% French), plus a word error rate of just 2.82% in English. Itâs production-ready with a Rust server and voice cloning from a 10-second clipâperfect for LLM-integrated apps. Check it out (X Announcement, HF Model).Qwen-TTS from Alibaba also launched, focusing on Chinese dialects like Pekingese and Shanghainese, but with English support too. Itâs got human-level naturalness via API, though less relevant for our English audience. Still, itâs a solid stepâsee more (X Post). Both are pieces of the puzzle for richer virtual interactions, and Iâm pumped to see where this goes.Infrastructure for Agents: Daytonaâs Sandbox RevolutionIâm thrilled to have chatted with Ivan Burazin from Daytona, a cloud provider delivering agent-native runtimesâor sandboxesâthat give agents their own computers for tasks like code execution or data analysis. Theyâve hit over $1M in annualized run rate just two months post-launch, with 15,000 signups and 1,500 credit cards on file. Thatâs insane growth for infrastructure, which usually ramps slowly due to integration delays.Whyâs this hot? 2025 is the year of agents, and as Ivan shared, even OpenAI and Anthropic recently redefined agents as needing runtimes. From YCâs latest batch (37% building agents) to Cursorâs web move, every task may soon spin up a sandbox. Daytonaâs âstateful serverlessâ tech spins fast, lasts long, and scales across regions like the US, UK, Germany, and India, addressing latency and GDPR needs. If youâre building agents, this is your unsung heroâexplore it at (Daytona IO) and grab $200 in credits, or up to $50K for startups (Startups).Wrapping Up: AIâs Relentless PaceWhat a week, folks! From Chinese open-source titans like ERNIE 4.5 and Hunyuan-A13B redefining accessibility, to Metaâs blockbuster hires signaling an AI arms race, and Microsoftâs MAI-DxO paving the way for smarter healthcare, weâre witnessing AIâs relentless acceleration. Mirageâs game engine and Daytonaâs sandboxes remind us that creativity and infrastructure are just as critical as models themselves. Iâm buzzing with anticipation for whatâs nextâwill Metaâs dream team deliver? Will agents redefine every app? Stick with ThursdAI to find out. See you next week for more!TL;DR and Show NotesHereâs the quick rundown of everything we covered this week, packed with links to dive deeper:* Show Notes & Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed* Guests - Ivan Burazin (Daytona), Michael Luo (Agentica)* Open Source LLMs* Baiduâs ERNIE 4.5 Series - 10 models, 424B to 0.3B, multimodal, beats o1 on DocVQA (X, HF, Paper)* Tencentâs Hunyuan-A13B-Instruct - 80B total, 13B active, 256K context, WizardLM legacy (X, HF, Try It)* Huaweiâs Pangu Pro MoE - 72B, trained on Ascend NPUs, 1,528 tokens/sec (X, HF)* DeepSWE-Preview - RL agent, 59% SWE-Bench-Verified on Qwen3-32B (Notion, HF Model)* This Weekâs Buzz* Weights & Biases Weavehacks Hackathon - SF, July 12-13, agent protocols focus (Sign Up)* Big CO LLMs + APIs* Meta Superintelligence Labs (MSL) - Zuck hires dream team, up to $300M comp packages from OpenAI talent (list)* Cursor - Hires Claude Code creators, web/mobile agents with Slack (Cursor, HF)* Microsoft MAI-DxO - 85.5% accuracy on NEJM cases vs. 20% for doctors (X, Blog)* Cloudflare - One-click AI bot blocking, tackles scraping economics (X)* Cypher Alpha - Mystery 1M context model, possibly Amazon Titan (Link)* Gemini Pro 2.5 - Returned to Googleâs free tier* Vision & Video* Mirage - AI-native UGC game engine, real-time photorealistic demos (Playable Demo)* Workflow - Restyle videos with Flux Kontext and Luma Modify (X)* Voice & Audio* Kyutai TTS - Low-latency, high similarity in EN/FR (X, HF)* Qwen-TTS - Bilingual Chinese/English, human-level naturalness (X, HF)* Infrastructure* Daytona - Agent-native sandboxes, $1M run rate in 2 months (GitHub, Startups)* Tools* Chai Discoveryâs Chai-2 - Zero-shot antibody design (Chai Discovery)Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and Iâll catch you next week for more! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe