Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
93
LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as these measures get more and more saturated, is user experience increasing in proportion to these scores? If we envision a future
7 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from The Gradient

Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

What is the Role of Mathematics in Modern Machine Learning? The past decade has witnessed a shift in how progress is made in machine learning. Research involving carefully designed and mathematically principled architectures result in only marginal improvements while compute-intensive and engineering-first efforts that scale to ever larger training sets

5 months ago 68 votes
We Need Positive Visions for AI Grounded in Wellbeing

Introduction Imagine yourself a decade ago, jumping directly into the present shock of conversing naturally with an encyclopedic AI that crafts images, writes code, and debates philosophy. Won’t this technology almost certainly transform society — and hasn’t AI’s impact on us so far been

8 months ago 96 votes
Financial Market Applications of LLMs

The AI revolution drove frenzied investment in both private and public companies and captured the public’s imagination in 2023. Transformational consumer products like ChatGPT are powered by Large Language Models (LLMs) that excel at modeling sequences of tokens that represent words or parts of words [2]. Amazingly, structural

a year ago 95 votes
A Brief Overview of Gender Bias in AI

A brief overview and discussion on gender bias in AI

a year ago 96 votes

More in AI

OpenAI’s dirty December o3 demo doesn’t readily replicate

Don’t believe everything you see

23 hours ago 3 votes
AI #113: The o3 Era Begins

Enjoy it while it lasts.

7 hours ago 1 votes
New Results of State-of-the-art LLMs on 4 Political Orientation Tests

One model appears closer to the center than the rest

2 days ago 4 votes
You Better Mechanize

Or you had better not.

2 days ago 3 votes
What LLMs Will Do To Jobs: All You Need is an Oracle

LLMs are Mainly Tools That Enhance Experts

2 days ago 6 votes