GPT-4 Turbo vs Claude 2 - LLM’s compared

You can trust PCWer: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

We compare the brand-new and cutting-edge GPT-4 Turbo LLM to the much-underestimated Claude 2 (Stylized Claude-2) LLM. To be clear, this is a comparison of large language models, not the AI chatbots they power (except where relevant). We analyze GPT-4 Turbo vs Claude 2 in terms of capabilities, context windows, pricing, accuracy and more! So, how does OpenAI’s ChatGPT AI model lead by CEO Sam Altman fare against the AI safety showpiece lead by CEO Dario Amodei?

GPT-4 Turbo vs Claude 2- Benchmark Comparison

GPT-4 Turbo is the latest LLM (large language model) from OpenAI. It was announced by the AI R&D firm at OpenAI DevDay in San Francisco, on November 6th, 2023. This was an impressive reveal considering that the prior models GPT-4 and GPT-4V, both identical in their natural language processing (NLP) multimodal capabilities, were jointly 1st place in the AI race. Now, OpenAI CEO Sam Altman, standing on stage with his leading investor, Microsoft CEO Satya Nadella, one-upped ChatGPT with what seems to be the only thing that can – a better ChatGPT.

Company	CEO	AI Chatbot	LLM	API	Open-source
xAI	Elon Musk	Grok	Grok-1	No	No
OpenAI	Sam Altman	ChatGPT	GPT-3.5, GPT-4, GPT-4V, or GPT-4 Turbo	Yes	No
Google	Sundar Pichai	Bard	PaLM 2	Yes	No
Microsoft	Satay Nadella	Bing Chat	GPT-4	No	No
Meta	Mark Zuckerberg	Meta AI	LLaMA 2	No	Yes
Anthropic	Dario Amodei	Claude	Claude-2	Yes	No
Amazon	Andy Jassy	Olympus (rumored)	Olympus (rumored)	No	No

The AI chatbots of big tech.

xAI, the artificial intelligence firm founded by CEO Elon Musk, recently conducted research into the rankings of every AI chatbot and their respective AI models. The verdict was concluded after all leading foundational large language models of big tech were tested across four benchmarks – namely GSM8k, MMLU, HumanEval, and MATH. Included in this comprehensive comparison were OpenAI’s GPT-4, Anthropic’s Claude-2, Google’s PaLM 2, xAI’s Grok-1, OpenAI’s GPT-3.5, Pi’s Inflection-1, Meta’s LLaMA 2, and xAI’s Grok-0 in descending order of power / accuracy. This puts Claude-2 in 2nd place!

Benchmark	Grok-0	LLaMa 2	Inflection-1	GPT-3.5	Grok-1	PaLM 2	Claude-2	GPT-4
GSM8k	56.8%	56.8%	62.9%	57.1%	62.9%	80.7%	88%	92%
MMLU	65.7%	68.9%	72.7%	70.0%	73.0%	78%	75%	86.4%
HumanEval	39.7%	29.9%	35.4%	48.1%	63.2%	N/A	70%	67%
MATH	15.7%	13.5%	16.0%	23.5%	23.9%	34.6%	N/A	42.5%

The large language models of big tech, as benchmarked by xAI.

OpenAI vs Anthropic – AI chatbot features

The new GPT-4 model has the same use cases as existing variants of the GPT-4 foundation model. Internet access to real-time information, plugin support, and Advanced Data Analysis for math and PDF / Excel insights or summarization.

By comparison, Anthropic’s Claude 2 has none of these ‘prompt modifiers’, which each add complexity but result in more useful evaluations for complex tasks. Claude-2 also falls short for image output, where GPT-4 Turbo will feature integration with AI image generator DALL-E 3 (Stylized DALL·E 3). Anthropic, by contrast, has no proprietary AI art generator.

However, Claude-2 has something that GPT-4 Turbo doesn’t. Claude is a constitutional AI (CAI) that “shapes the outputs of AI systems according to a set of principles, with the goal of making a helpful, harmless, and honest AI assistant.” The principal purpose of Claude (and the Claude-2 LLM) is as an ethics research tool, to fine-tune our understanding of machine learning, a guide it towards AI safety goals with human feedback and reinforcement learning.

GPT-4 Turbo does have more parameters though. In terms of comprehension, coherence, superior performance, and high-quality output, the OpenAI chatbot model wins, with Claude-2 coming in 2nd place.

GPT-4 Turbo vs Claude 2 – LLM’s compared

Table of Contents

GPT-4 Turbo vs Claude 2- Benchmark Comparison

OpenAI vs Anthropic – AI chatbot features

Steve Hook

GPT-4 Turbo vs Claude 2 – LLM’s compared

Table of Contents

GPT-4 Turbo vs Claude 2- Benchmark Comparison

OpenAI vs Anthropic – AI chatbot features

Related