Sam Bowman on benchmarking and AI alignment

23.2.2023

CS224U

0:00

1:26:27

Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.

Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/

Weitere Episoden von „CS224U“

Sam Bowman on benchmarking and AI alignment
23.2.2023
1:26:27
Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment. Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/ Sam's website Sam on Twitter NYU Linguistics NYU Data Science NYU Computer Science Anthropic SNLI paper: A large annotated corpus for learning natural language inference SNLI leaderboard FraCaS SICK A SICK cure for the evaluation of compositional distributional semantic models SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment RTE Knowledge Resources Richard Socher Chris Manning Andrew Ng Ray Kurtzweil SQuAD Gabor Angeli Adina Williams Adina Williams podcast episode MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference MultiNLI leaderboards Twitter discussion of LLMs and negation GLUE SuperGLUE DecaNLP GPT-3 paper: Language Models are Few-Shot Learners FLAN Winograd schema challenges BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding JSALT: General-Purpose Sentence Representation Learning Ellie Pavlick Ellie Pavlick podcast episode Tal Linzen Ian Tenney Dipanjan Das Yoav Goldberg Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks Big Bench Upwork Surge AI Dynabench Douwe Kiela Douwe Kiela podcast episode Ethan Perez NYU Alignment Research Group Eliezer Shlomo Yudkowsky Alignment Research Center Redwood Research Percy Liang podcast episode Richard Socher podcast episode
Amir Goldberg on the impact of AI
27.1.2023
1:28:19
AI and social science, the causal revolution in economics, predictions about the impact of AI, teaching MBAs, productizing AI, and a journey from Tel Aviv to Princeton to Stanford. Transcript: https://web.stanford.edu/class/cs224u/podcast/goldberg/ Amir's website Amir on Twitter Computational Culture Lab ChatGPT Laura Nelson Bart Bonikowski Chris Winship Bernie Koch Treebanks BIG-bench Guido Imbens Endogeneity Susan Athey Cambridge Analytica Prediction Machines Speech and Language Processing DALL-E 2 Midjourney Stable Diffusion Postmodernism, or, the Cultural Logic of Late Capitalism Turing test Matt Salganik Paul DiMaggio
Verpasse keine Episode von “CS224U” und abonniere ihn in der kostenlosen GetPodcast App.
Marie-Catherine de Marneffe on understanding your data
7.11.2022
1:08:41
Leaving Ohio, being back in Belgium, organizing NAACL 2022, reviewing at NLP-scale, universal dependencies, and doing NLU before it was cool. Transcript: https://web.stanford.edu/class/cs224u/podcast/demarneffe/ Marie's website Generating Typed Dependency Parses from Phrase Structure Parses Universal Dependencies project OSU Linguistics NAACL 2022 Dan Jurafsky Dan Roth Chris Manning ARR Priscilla Rasmussen Transactions of the ACL Finding Contradictions in Text Not a simple yes or no: Uncertainty in indirect answers Recognizing Textual Entailment Anna Rafferty Scott Grimm "Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives Did It Happen? The Pragmatic Complexity of Veridicality Assessment Yejin Choi Yejin Choi's ACl 2022 talk Barbara Plank Linguistically debatable or just plain wrong? Jesse Dodge Reproducibility badges at NAACL 2022 Stanford Sentiment Treebank Judith Tonhauser Nan-Jiang Jiang Lauri Karttunen Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data Microsoft DeBERTa surpasses human performance on the SuperGLUE benchmark Daniel Zeman Marta Recasens
Sasha Rush on NLP research, engineering, and education
4.10.2022
1:22:39
Coding puzzles, practices, and education, structured prediction, the culture of Hugging Face, large models, and the energy of New York. Transcript: https://web.stanford.edu/class/cs224u/podcast/rush/ Sasha's website Sasha on Twitter Sasha on the Humans of AI podcast Sasha on The Thesis Review Podcast with Sean Welleck Sasha on the Talking Machines Podcast Sasha interviewed by Sayak Paul Hugging Face PyTorch The Annotated Transformer The Annotated Alice The Annotated S4 Sasha and Dan Oneață's declarative graphics library Chalk Drawing Big Ben in Chalk OpenNMT Ken Shan Blog post by Ken and Dylan Thurston Edward Z. Yang Stuart Shieber Literate programming Soumith Chintala Lua Torch TensorFlow Graham Neubig Chris Dyer DyNet JAX jax.vmap Matt Johnson Finale Doshi-Velez, whose undergrad ML course inspired and informed Sasha's Tensor Puzzles GPU Puzzles A tweet that Chris added to his CV Adam Paszke Dougal MacLaurin Dex Named Tensor notation Named Tensors in PyTorch TorchDim Mini Torch Torch-Struct Sarah Hooker's paper 'The hardware lottery' Jacob Andreas Kevin Ellis Hugging Face transformers library Hugging Face datasets library Hugging Face diffusers library Hugging Face evaluate library scikit-learn Big Science blog BLOOM The Technology Behind BLOOM Training CRFM Eleuther T0 and PromptSource Washington Post: Big Tech builds AI with bad data. So scientists sought better data The bet: Is Attention All You Need? Democratizing access to large-scale language models with OPT-175B Epic OPT-175 Logbook Google's PaLM United's shares plunge 76% on bogus bankruptcy report Imagen Albert Gu Bell Labs
Diyi Yang on socially aware language technologies
1.8.2022
1:21:49
Moving to Stanford, linguistic and social variation, interventional studies, and shared stories and lessons learned from an ACL Young Rising Star. Transcript: https://web.stanford.edu/class/cs224u/podcast/yang/ Diyi's website Diyi on Twitter Dan Jurafsky The Stanford NLP Group Buford Highway in Atlanta Sweet tea VALUE paper AAE GLUE Negative concord Exploring the role of grammar and word choice in bias toward African American English (AAE) in hate speech classification Inducing positive perspectives with text reframing Dynabench Datasheets for datasets MTurk Upwork Prolific Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities ToTTo: A controlled table-to-text generation dataset Six questions for socially aware language technologies The importance of modeling social factors of language: Theory and practice Dirk Hovy Workshop on Shared Stories and Lessons Learned EMNLP 2022 Workshop on Shared Stories and Lessons Learned ICCV 2021 Jeff Hancock
Maria Antoniak on cultural analytics
27.6.2022
1:26:26
Birth narratives, stable static representations, NLP for everyone, AI2 and Semantic Scholar, the mission of Ukrainian Catholic University, and books books books. Transcript: https://web.stanford.edu/class/cs224u/podcast/antoniak/ Maria's website Maria on Twitter Semantic Scholar Elliott Ash ETH Zurich Center for Law and Economics Text As Data (TADA) 2022 David Mimno A computational reading of a birth stories community r/BabyBumps Roger Shank Nate Chambers ICWSM 2022 workshop: BERT for Social Sciences and Humanities Measuring Word Similarity with BERT (Sephora Makeup Reviews) Melanie Walsh word2vec BERT Nick Vincent's Twitter thread on Meta's OPT-175B filtering strategies Stemming Alexandra Schofield LDA LSA GloVe Evaluating the stability of embedding-based word similarities Narrative datasets through the lenses of NLP and HCI Belmont report Casey Fiesler Naive Bayes Allen Institute CORD-19 dataset, which appeared March 16, 2020! Books books books Pushkin Press New York Review Books Posthumous Memoirs of Brás Cubas And Then There Were None Stanisław Lem Jeff VanderMeer Italo Calvino Jorge Luis Borges xkcd War and Peace Middlemarch Beloved Novelist Cormac McCarthy's tips on how to write a great science paper Blood Meridian No Country for Old Men (book) No Country for Old Men (movie) The Road Talking a visual walk through Burnt Norton Ukrainian Catholic University Support Ukraine Now: Real Ways You can Help Ukraine Let Ukraine Speak: Integrating Scholarship on Ukraine into Classroom Syllabi Ukraine Trust Chain spilka World Central Kitchen Caritas Ukraine Science for Ukraine Data Science Crash Course: Interview Prep
Percy Liang on the Center for Research on Foundation Models
13.6.2022
1:27:24
Realizing that Foundation Models are a big deal, scaling, why Percy founded CRFM, Stanford's position in the field, benchmarking, privacy, and CRFM's first and next 30 years. Transcript: https://web.stanford.edu/class/cs224u/podcast/liang/ Percy's website Percy on Twitter CRFM On the opportunities and risks of foundation models ELMo: Deep contextualized word representations BERT: Pre-training of deep bidirectional Transformers for language understanding Sam Bowman GPT-2 Adversarial examples for evaluating reading comprehension systems System 1 and System 2 The Unreasonable Effectiveness of Data Chinchilla: Training Compute-Optimal Large Language Models GitHub Copilot LaMDA: Language models for dialog applications AI Test Kitchen DALL-E 2 Richer Socher on the CS224U podcast you.com Chris Ré Fei-Fei Li Chris Manning HAI Rob Reich Erik Brynjolfsson Dan Ho Russ Altman Jeff Hancock The time is now to develop community norms for the release of foundation models Twitter Spaces event Best practices for deploying language models Model Cards for model reporting Datasheets for datasets Strathern's law
Roger Levy on computational psycholinguistics in the deep learning era
10.6.2022
1:28:46
From genes to memes, evidence in linguistics, central questions of computational psycholinguistics, academic publishing woes, and the benefits of urban density. Transcript: https://web.stanford.edu/class/cs224u/podcast/levy/ Roger's website Roger on Twitter Roger's courses The Selfish Gene Joan Bresnan John Rickford Chris Manning Noah Goodman Thomas Clark Ted Gibson Ethan Wilcox Critical period Yevgeni Berzak Heritage language How many words do kids hear each year? See footnote 10. W.E.I.R.D Kristina Gulordava Poverty of stimulus hypothesis Formal grammar and information theory: together again? Expectation-based syntactic comprehension Google Ngram viewer Google Ngram data files Geoff Hinton's 2001 Rummelhart Prize from the Cognitive Science Society Center embedding Mark Johnson Stuart Shieber Ivan Sag Cognitive constraints and island effects The Chicken or the Egg? A Probabilistic Analysis of English Binomials Sarah Bunin Benor Roger's pinned tweet Eric Bakovi&cacute; MIT's committee on the library system Project DEAL Diamond open access Fernanda Ferreira Brian Dillon Glossa Psycholinguistics Glossa Johan Rooryck La Jolla Cove
Kalika Bali on language technologies for a multilingual world
24.5.2022
1:29:07
Giving a TED talk, linguistic diversity, code switching and large language models, the Indian NLP scene, empowering women with language consultation work, Wordle, and "once a linguist, always a linguist". Transcript: https://web.stanford.edu/class/cs224u/podcast/bali/ Kalika's website Kalika on Twitter Kalika's TED talk Microsoft Research India HAL IndicBERT AI4Bharat mBERT Hindi Bangla English Gondi Adivasi radio Oriya Karya crowdsourcing platform Sandy Chung Language processing experiments in the field Tamil Telugu Idu Mishmi COMPASS 2022 Digital Green Everwell Wordle Information-theoretic analysis of Wordle
Yulia Tsvetkov on ethical NLP
16.5.2022
1:23:03
Coast-to-coast professional journeys, multilingual NLP, teaching in a fast-changing field, the history of hate speech detection in NLP, ethics review of NLP research, research on sensitive topics, mentoring researchers, and optimizing for your own passions. Transcript: https://web.stanford.edu/class/cs224u/podcast/tsvetkov/ Yulia's website TsvetShop Shuly Wintner Just when I thought I was out ... Algorithms for NLP HMMs Kneser–Ney smoothing Noah Smith Demoting racial bias in hate speech detection The risk of racial bias in hate speech detection Demoting racial bias in hate speech detection Fortifying toxic speech detectors against veiled toxicity This is the daily stormer's playbook Microaggressions.com Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts https://delphi.allenai.org Delphi: Towards Machine Ethics and Norms Yejin Choi

Weitere Episoden

Hol dir die ganze Welt der Podcasts mit der kostenlosen GetPodcast App.

Abonniere alle deine Lieblingspodcasts, höre Episoden auch offline und erhalte passende Empfehlungen für Podcasts, die dich wirklich interessieren.

Sam Bowman on benchmarking and AI alignment

CS224U

Weitere Episoden von „CS224U“

Sam Bowman on benchmarking and AI alignment

Amir Goldberg on the impact of AI

Marie-Catherine de Marneffe on understanding your data

Sasha Rush on NLP research, engineering, and education

Diyi Yang on socially aware language technologies

Maria Antoniak on cultural analytics

Percy Liang on the Center for Research on Foundation Models

Roger Levy on computational psycholinguistics in the deep learning era

Kalika Bali on language technologies for a multilingual world

Yulia Tsvetkov on ethical NLP