Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.
Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/
- Sam's website
- Sam on Twitter
- NYU Linguistics
- NYU Data Science
- NYU Computer Science
- Anthropic
- SNLI paper: A large annotated corpus for learning natural language inference
- SNLI leaderboard
- FraCaS
- SICK
- A SICK cure for the evaluation of compositional distributional semantic models
- SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
- RTE Knowledge Resources
- Richard Socher
- Chris Manning
- Andrew Ng
- Ray Kurtzweil
- SQuAD
- Gabor Angeli
- Adina Williams
- Adina Williams podcast episode
- MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference
- MultiNLI leaderboards
- Twitter discussion of LLMs and negation
- GLUE
- SuperGLUE
- DecaNLP
- GPT-3 paper: Language Models are Few-Shot Learners
- FLAN
- Winograd schema challenges
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- JSALT: General-Purpose Sentence Representation Learning
- Ellie Pavlick
- Ellie Pavlick podcast episode
- Tal Linzen
- Ian Tenney
- Dipanjan Das
- Yoav Goldberg
- Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
- Big Bench
- Upwork
- Surge AI
- Dynabench
- Douwe Kiela
- Douwe Kiela podcast episode
- Ethan Perez
- NYU Alignment Research Group
- Eliezer Shlomo Yudkowsky
- Alignment Research Center
- Redwood Research
- Percy Liang podcast episode
- Richard Socher podcast episode
Weitere Episoden von „CS224U“
Verpasse keine Episode von “CS224U” und abonniere ihn in der kostenlosen GetPodcast App.