Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241

6/18/2024

MLOps.community

0:00

57:21

Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // MLOps podcast #241 with Boris Selitser, Co-Founder and CTO/CPO of Okareo. A big thank you to LatticeFlow for sponsoring this episode! LatticeFlow - https://latticeflow.ai/ // Abstract Explore the evolving landscape of building LLM applications, focusing on the critical roles of synthetic data and agent evaluations. Discover how synthetic data enhances model behavior description, prototyping, testing, and fine-tuning, driving robustness in LLM applications. Learn about the latest methods for evaluating complex agent-based systems, including RAG-based evaluations, dialog-level assessments, simulated user interactions, and adversarial models. This talk delves into the specific challenges developers face and the tradeoffs involved in each evaluation approach, providing practical insights for effective AI development. // Bio Boris is the Co-Founder and CTO/CPO at Okareo. Okareo is a full-cycle platform for developers to evaluate and customize AI/LLM applications. Before Okareo, Boris was Director of Product at Meta/Facebook, leading teams building internal platforms and ML products. Examples include a copyright classification system across the Facebook apps and an engagement platform for over 200K developers, 500K+ creators, and 12M+ Oculus users. Boris has a bachelor’s in Computer Science from UC Berkeley. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://docs.okareo.com/blog/data_loop https://docs.okareo.com/blog/agent_eval The Real E2E RAG Stack // Sam Bean // MLOps Podcast #217 - https://youtu.be/8uZst7pgOw0

RecSys at Spotify // Sanket Gupta // MLOps Podcast #232 - https://youtu.be/byH-ARJA4gk --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Boris on LinkedIn: https://www.linkedin.com/in/selitser/

Timestamps: [00:00] Boris' preferred coffee [00:37] Takeaways [02:32] Please like, share, leave a review, and subscribe to our MLOps channels! [02:48] Software Engineering and Data Science [06:01] AI Transformative Potential Explained [10:31] Prompt Injection Protection Strategies [17:03] Agent's metrics for Jira [24:11] Data and Metrics Evolution [27:54] Evaluation Focus Enhances Systems [31:22 - 32:52] LatticeFlow AD [32:55] Custom Evaluation and Synthetic Data [36:23] Synthetic data for expansion, evaluation, and map [41:06] Diverse agents' personalities for readiness [44:25] Agent functions [46:17] Optimizing Routing Agents [50:04] Adapting to tool output for decision-making [52:56] Agent framework evolution [55:41] Agent framework for delivering value [57:03] Wrap up

More episodes from "MLOps.community"

More Episodes

Get the whole world of podcasts with the free GetPodcast app.

Subscribe to your favorite podcasts, listen to episodes offline and get thrilling recommendations.

Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241

MLOps.community

More episodes from "MLOps.community"

ML and AI as Distinct Control Systems in Heavy Industrial Settings // Richard Howes // #243

Accelerating Multimodal AI // Ethan Rosenthal // #242

Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241

How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable

From Robotics to Recommender Systems // Miguel Fierro // #240

Uber's Michelangelo: Strategic AI Overhaul and Impact // #239

AWS Tranium and Inferentia // Kamran Khan and Matthew McClean // #238

Build Reliable Systems with Chaos Engineering // Benjamin Wilms // #237

Managing Small Knowledge Graphs for Multi-agent Systems // Tom Smoker // #236

Just when we Started to Solve Software Docs, AI Blew Everything Up // Dave Nunez // #235