
đ ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news
Well well well, December is finally here, we're about to close out this year (and have just flew by the second anniversary of chatGPT đ) and it seems that all of the AI labs want to give us X-mas presents to play with over the holidays!
Look, I keep saying this, but weeks are getting crazier and crazier, this week we got the cheapest and the most expensive AI offerings all at once (the cheapest from Amazon and the most expensive from OpenAI), 2 new open weights models that beat commercial offerings, a diffusion model that predicts the weather and 2 world building models, oh and 2 decentralized fully open sourced LLMs were trained across the world LIVE and finished training. I said... crazy week!
And for W&B, this week started with Weave launching finally in GA đ, which I personally was looking forward for (read more below)!
TL;DR Highlights
* OpenAI O1 & Pro Tier: O1 is out of preview, now smarter, faster, multimodal, and integrated into ChatGPT. For heavy usage, ChatGPT Pro ($200/month) offers unlimited calls and O1 Pro Mode for harder reasoning tasks.
* Video & Audio Open Source Explosion: Tencentâs HYVideo outperforms Runway and Luma, bringing high-quality video generation to open source. Fishspeech 1.5 challenges top TTS providers, making near-human voice available for free research.
* Open Source Decentralization: Nous Researchâs DiStRo (15B) and Prime Intellectâs INTELLECT-1 (10B) prove you can train giant LLMs across decentralized nodes globally. Performance is on par with centralized setups.
* Googleâs Genie 2 & WorldLabs: Generating fully interactive 3D worlds from a single image, pushing boundaries in embodied AI and simulation. Googleâs GenCast also sets a new standard in weather prediction, beating supercomputers in accuracy and speed.
* Amazonâs Nova FMs: Cheap, scalable LLMs with huge context and global language coverage. Perfect for cost-conscious enterprise tasks, though not top on performance.
* đ Weave by W&B: Now in GA, itâs your dashboard and tool suite for building, monitoring, and scaling GenAI apps. Get Started with 1 line of code
OpenAIâs 12 Days of Shipping: O1 & ChatGPT Pro
The biggest splash this week came from OpenAI. Theyâre kicking off â12 days of launches,â and Day 1 brought the long-awaited full version of o1. The main complaint about o1 for many people is how slow it was! Well, now itâs not only smarter but significantly faster (60% faster than preview!), and officially multimodal: it can see images and text together.
Better yet, OpenAI introduced a new ChatGPT Pro tier at $200/month. It offers unlimited usage of o1, advanced voice mode, and something called o1 pro mode â where o1 thinks even harder and longer about your hardest math, coding, or science problems. For power usersâmaybe data scientists, engineers, or hardcore codersâthis might be a no-brainer. For others, 200 bucks might be steep, but hey, someoneâs gotta pay for those GPUs. Given that OpenAI recently confirmed that there are now 300 Million monthly active users on the platform, and many of my friends already upgraded, this is for sure going to boost the bottom line at OpenAI!
Quoting Sam Altman from the stream, âThis is for the power users who push the model to its limits every day.â For those who complained o1 took forever just to say âhi,â rejoice: trivial requests will now be answered quickly, while super-hard tasks get that legendary deep reasoning including a new progress bar and a notification when a task is complete. Friend of the pod Ray Fernando gave pro a prompt that took 7 minutes to think through!
I've tested the new o1 myself, and while I've gotten dangerously close to my 50 messages per week quota, I've gotten some incredible results already, and very fast as well. This ice-cubes question failed o1-preview and o1-mini and it took both of them significantly longer, and it took just 4 seconds for o1.
Open Source LLMs: Decentralization & Transparent Reasoning
Nous Research DiStRo & DeMo Optimizer
Weâve talked about decentralized training before, but the folks at Nous Research are making it a reality at scale. This week, Nous Research wrapped up the training of a new 15B-parameter LLMâcodename âPsycheââusing a fully decentralized approach called âNous DiStRo.â Picture a massive AI model trained not in a single data center, but across GPU nodes scattered around the globe. According to Alex Volkov (host of ThursdAI), âThis is crazy: theyâre literally training a 15B param model using GPUs from multiple companies and individuals, and itâs working as well as centralized runs.â
The key to this success is âDeMoâ (Decoupled Momentum Optimization), a paper co-authored by none other than Diederik Kingma (yes, the Kingma behind Adam optimizer and VAEs). DeMo drastically reduces communication overhead and still maintains stability and speed. The training loss curve theyâve shown looks just as good as a normal centralized run, proving that decentralized training isnât just a pipe dream. The code and paper are open source, and soon weâll have the fully trained Psyche model. Itâs a huge step toward democratizing large-scale AIâno more waiting around for Big Tech to drop their weights. Instead, we can all chip in and train together.
Prime Intellect INTELLECT-1 10B: Another Decentralized Triumph
But wait, thereâs more! Prime Intellect also finished training their 10B model, INTELLECT-1, using a similar decentralized setup. INTELLECT-1 was trained with a custom framework that reduces inter-GPU communication by 400x. Itâs essentially a global team effort, with nodes from all over the world contributing compute cycles.
The result? A model hitting performance similar to older Meta models like Llama 2âbut fully decentralized.
Ruliad DeepThought 8B: Reasoning You Can Actually See
If thatâs not enough, weâve got yet another open-source reasoning model: Ruliadâs DeepThought 8B. This 8B parameter model (finetuned from LLaMA-3.1) from friends of the show FarEl, Alpin and Sentdex đ
Ruliadâs DeepThought attempts to match or exceed performance of much larger models in reasoning tasks (beating several 72B param models while being 8B itself) is very impressive.
Google is firing on all cylinders this week
Google didn't stay quiet this week as well, and while we all wait for the Gemini team to release the next Gemini after the myriad of very good experimental models recently, we've gotten some very amazing things this week.
Googleâs PaliGemma 2 - finetunable SOTA VLM using Gemma
PaliGemma v2, a new vision-language family of models (3B, 10B and 33B) for 224px, 448px, 896px resolutions are a suite of base models, that include image segmentation and detection capabilities and are great at OCR which make them very versatile for fine-tuning on specific tasks.
They claim to achieve SOTA on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation!
Google GenCast SOTA weather prediction with... diffusion!?
More impressively, Google DeepMind released GenCast, a diffusion-based model that beats the state-of-the-art ENS system in 97% of weather predictions. Did we say weather predictions? Yup.
Generative AI is now better at weather forecasting than dedicated physics based deterministic algorithms running on supercomputers. Gencast can predict 15 days in advance in just 8 minutes on a single TPU v5, instead of hours on a monstrous cluster. This is mind-blowing. As Yam said on the show, âPredicting the world is crazy hardâ and now diffusion models handle it with ease.
W&B Weave: Observability, Evaluation and Guardrails now in GA
Speaking of building and monitoring GenAI apps, we at Weights & Biases (the sponsor of ThursdAI) announced that Weave is now GA. Weave is a developer tool for evaluating, visualizing, and debugging LLM calls in production. If youâre building GenAI appsâlike a coding agent or a tool that processes thousands of user requestsâWeave helps you track costs, latency, and quality systematically.
We showcased two internal apps: Open UI (a website builder from a prompt) and Winston (an AI agent that checks emails, Slack, and more). Both rely on Weave to iterate, tune prompts, measure user feedback, and ensure stable performance. With O1 and other advanced models coming to APIs soon, tools like Weave will be crucial to keep those applications under control.
If you follow this newsletter and develop with LLMs, now is a great way to give Weave a try
Open Source Audio & Video: Challenging Proprietary Models
Tencentâs HY Video: Beating Runway & Luma in Open Source
Tencent came out swinging with their open-source model, HYVideo. Itâs a video model that generates incredible realistic footage, camera cuts, and even audioâyep, Foley and lip-synced character speech. Just a single model doing text-to-video, image-to-video, puppeteering, and more. It even outperforms closed-source giants like Runway Gen 3 and Luma 1.6 on over 1,500 prompts.
This is the kind of thing we dreamed about when we first heard of video diffusion models. Now itâs here, open-sourced, ready for tinkering. âItâs near SORA-level,â as I mentioned, referencing OpenAIâs yet-to-be-fully-released SORA model. The future of generative video just got more accessible, and competitors should be sweating right now. We may just get SORA as one of the 12 days of OpenAI releases!
FishSpeech 1.5: Open Source TTS Rivaling the Big Guns
Not just videoâaudio too. FishSpeech 1.5 is a multilingual, zero-shot voice cloning model that ranks #2 overall on TTS benchmarks, just behind 11 Labs. This is a 500M-parameter model, trained on a million hours of audio, achieving near-human quality, fast inference, and open for research.
This puts high-quality text-to-speech capabilities in the open-source communityâs hands. You can now run a top-tier TTS system locally, clone voices, and generate speech in multiple languages with low latency. No more relying solely on closed APIs. This is how open-source chasesâand often catchesâcommercial leaders.
If youâve been longing for near-instant voice cloning on your own hardware, this is the model to go play with!
Creating World Models: Genie 2 & WorldLabs
Fei Fei Liâs WorldLabs: Images to 3D Worlds
WorldLabs, founded by Dr. Fei Fei Li, showcased a mind-boggling demo: turning a single image into a walkable 3D environment. Imagine you take a snapshot of a landscape, load it into their system, and now you can literally walk around inside that image as if it were a scene in a video game. âI can literally use WASD keys and move around,â Alex commented, clearly impressed.
Itâs not perfect fidelity yet, but itâs a huge leap toward generating immersive 3D worlds on the fly. These tools could revolutionize virtual reality, gaming, and simulation training. WorldLabsâ approach is still in early stages, but what they demonstrated is nothing short of remarkable.
Googleâs Genie 2: Playable Worlds from a Single Image
If WorldLabsâs 3D environment wasnât enough, Google dropped Genie 2. Take an image generated by Imagen 3, feed it into Genie 2, and you get a playable world lasting up to a minute. Your character can run, objects have physics, and the environment is consistent enough that if you leave an area and return, itâs still there.
As I said on the pod, âIt looks like a bit of Doom, but generated from a single static image. Insane!â The model simulates complex interactionsâthink water flowing, balloons burstingâand even supports long-horizon memory. This could be a goldmine for AI-based game development, rapid prototyping, or embodied agent training.
Amazonâs Nova: Cheaper LLMs, Not Better LLMs
Amazon is also throwing their hat in the ring with the Nova series of foundational models. Theyâve got variants like Nova Micro, Lite, Pro, and even a Premier tier coming in 2025. The catch? Performance is kind of âmehâ compared to Anthropic or OpenAIâs top models, but Amazon is aiming to be the cheapest high-quality LLM among the big players. With a context window of up to 300K tokens and 200+ language coverage, Nova could find a niche, especially for those who want to pay less per million tokens.
Nova Micro costs around 3.5 cents per million input tokens and 14 cents per million output tokensâmaking it dirt cheap to process massive amounts of data. Although not a top performer, Amazonâs approach is: âWe may not be best, but weâre really cheap and we scale like crazy.â Given Amazonâs infrastructure, this could be compelling for enterprises looking for cost-effective large-scale solutions.
Phew, this was a LONG week with a LOT of AI drops, and NGL, o1 actually helped me a bit for this newsletter, I wonder if you can spot the places where o1 wrote some of the text using a the transcription of the show and the outline as guidelines and the previous newsletter as a tone guide and where I wrote it myself?
Next week, NEURIPS 2024, the biggest ML conference in the world, I'm going to be live streaming from there, so if you're at the conference, come by booth #404 and say hi! I'm sure there will be a TON of new AI updates next week as well!
Show Notes & Links
TL;DR of all topics covered:
* This weeks Buzz
* Weights & Biases announces Weave is now in GA đ(wandb.me/tryweave)
* Tracing LLM calls
* Evaluation & Playground
* Human Feedback integration
* Scoring & Guardrails (in preview)
* Open Source LLMs
* DiStRo & DeMo from NousResearch - decentralized DiStRo 15B run (X, watch live, Paper)
* Prime Intellect - INTELLECT-1 10B decentralized LLM (Blog, watch)
* Ruliad DeepThoutght 8B - Transparent reasoning model (LLaMA-3.1) w/ test-time compute scaling (X, HF, Try It)
* Google GenCast - diffusion model SOTA weather prediction (Blog)
* Google open sources PaliGemma 2 (X, Blog)
* Big CO LLMs + APIs
* Amazon announces Nova series of FM at AWS (X)
* Google GENIE 2 creates playable worlds from a picture! (Blog)
* OpenAI 12 days started with o1 full and o1 pro and pro tier $200/mo (X, Blog)
* Vision & Video
* Tencent open sources HY Video - beating Luma & Runway (Blog, Github, Paper, HF)
* Runway video keyframing prototype (X)
* Voice & Audio
* FishSpeech V1.5 - multilingual, zero-shot instant voice cloning, low-latency, open text to speech model (X, Try It)
* Eleven labs - real time audio agents builder (X)
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fler avsnitt frÄn "ThursdAI - The top AI news from the past week"
Missa inte ett avsnitt av âThursdAI - The top AI news from the past weekâ och prenumerera pĂ„ det i GetPodcast-appen.