‘Claude AI’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Gwern

“Simulating ‘tail Collapse’ in R ”, Gwern 2024

Simulating ‘tail collapse’ in R

“LLM Challenge: Write Non-Biblical Sentences ”, Gwern 2024

LLM Challenge: Write Non-Biblical Sentences

“A Christmas Protestation ”, o1-pro et al 2024

A Christmas Protestation

“On the Impossibility of Superintelligent Rubik’s Cube Solvers ”, Gwern et al 2023

On the Impossibility of Superintelligent Rubik’s Cube Solvers

Links

“Is Google Gemini-2.5-Pro Now Better Than Claude at Pokémon? [Probably] ”, Bradshaw 2025

Is Google Gemini-2.5-pro now better than Claude at Pokémon? [probably]

“Measuring Models’ Special Interests ”

Measuring Models’ Special Interests :

View HTML:

https://zswitten.github.io/2025/04/14/model-special-interests.html

“Anthropic Education Report: How University Students Use Claude ”, Anthropic 2025

Anthropic Education Report: How University Students Use Claude

“The People Who Fall in Love With Chatbots: I Interviewed People Who’ve Developed Emotional—Even Sexual—Relationships With LLMs. They’re Not As Crazy As They Seem ”, Dee 2025

The People Who Fall in Love With Chatbots: I interviewed people who’ve developed emotional—even sexual—relationships with LLMs. They’re not as crazy as they seem

“Why Does Claude Speak Byzantine Music Notation? ”, Finke 2025

Why does Claude Speak Byzantine Music Notation?

“Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”, Petrov et al 2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

“GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs ”, Vendrow et al 2025

GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs

“Obscure Scientific Facts Benchmark ”, Azulay 2025

Obscure Scientific Facts Benchmark

“Reflecting on WikiTok ”, Aizk 2025

Reflecting on WikiTok

“Fiction.live: LiveBench Results, 25 February 2025: Real-World Long Context Benchmark for Writers ”

Fiction.live: liveBench results, 25 February 2025: Real-World Long Context Benchmark for Writers

“Spontaneous Giving and Calculated Greed in Language Models ”, Li & Shirado 2025

Spontaneous Giving and Calculated Greed in Language Models

“Claude 3.7 Sonnet and Claude Code ”, Anthropic 2025

Claude 3.7 Sonnet and Claude Code

“None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks ”, Salido et al 2025

None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks

“Idiosyncrasies in Large Language Models ”, Sun et al 2025

Idiosyncrasies in Large Language Models

“Constitutional Classifiers: Defending against Universal Jailbreaks ”

Constitutional Classifiers: Defending against universal jailbreaks

“SycEval: Evaluating LLM Sycophancy ”, Fanous et al 2025

SycEval: Evaluating LLM Sycophancy

“Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs ”, Saxena et al 2025

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

“Do Large Language Model Benchmarks Test Reliability? ”, Vendrow et al 2025

Do Large Language Model Benchmarks Test Reliability?

“Thought Bubble: I Am Pleased to Report That AI Is Now a Better Poet Than William McGonagall ”, Hugh-Jones 2025

Thought bubble: I am pleased to report that AI is now a better poet than William McGonagall

“On DeepSeek and Export Controls ”, Amodei 2025

On DeepSeek and Export Controls

“A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives ”, Vance 2025

A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives :

View HTML:

/doc/www/www.corememory.com/c613a256525e633a3fcb8846713b5d9cd492dcf0.html

“Building Personal Software With Claude ”, Elhage 2025

Building personal software with Claude :

View HTML:

/doc/www/blog.nelhage.com/7b7c29617419e040d145eaeb19bd1855d5d99d71.html

“How Different LLMs Answered the PhilPapers 2020 Survey ”, Satron 2025

How different LLMs answered the PhilPapers 2020 survey

“People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”, Russell et al 2025

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

“Human Study on AI Spear Phishing Campaigns ”, Lermen & Heiding 2025

Human study on AI spear phishing campaigns

“Can LLMs Write Better Code If You Keep Asking Them to ‘Write Better Code’? ”

Can LLMs write better code if you keep asking them to ‘write better code’? :

View External Link:

https://minimaxir.com/2025/01/write-better-code/

“Won’t vs. Can’t: Sandbagging-Like Behavior from Claude Models ”

Won’t vs. Can’t: Sandbagging-like Behavior from Claude Models :

View HTML:

/doc/www/alignment.anthropic.com/551b4c951c710ea6c6bc7e9ea3002aa3e6333322.html

“Favorite Colors of Some LLMs ”, an 2024

Favorite colors of some LLMs

“Performance of LLMs on Advent of Code 2024 ”, Pinto 2024

Performance of LLMs on Advent of Code 2024

“Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Reesor 2024

Conversations with Tyler 2024 Retrospective: predictions with Claude

“The Emergence of Strategic Reasoning of Large Language Models ”, Lee & Kader 2024

The Emergence of Strategic Reasoning of Large Language Models

“Cultural Evolution of Cooperation among LLM Agents ”, Vallinder & Hughes 2024

Cultural Evolution of Cooperation among LLM Agents

“Clio: Privacy-Preserving Insights into Real-World AI Use ”, Anthropic 2024

Clio: Privacy-preserving insights into real-world AI use

“LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4) ”, Korinek 2024

LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4)

“A Few Prompts I Use to Test LLM Creativity ”

A Few Prompts I Use to Test LLM Creativity

“Age against the Machine—Susceptibility of Large Language Models to Cognitive Impairment: Cross Sectional Analysis ”

Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis

“Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects ”, Heiding et al 2024

Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects

“BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”, Paglieri et al 2024

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

“Business Spending on AI Surged 500% This Year to $13.8 Billion ”

Business spending on AI surged 500% this year to $13.8 billion

“Are LLMs Prescient? A Continuous Evaluation Using Daily News As the Oracle ”, Dai et al 2024

Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

“The Neruda Factory ”, Jenn 2024

The Neruda Factory :

View HTML:

/doc/www/jenn.site/64e8a75cfe83b7b754583dab77826628e2d3ee84.html

“Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters ”, Potter et al 2024

Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters

“AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ”, Andriushchenko et al 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

“Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making ”, Li et al 2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

“A Single Cloud Compromise Can Feed an Army of AI Sex Bots ”, Krebs 2024

A Single Cloud Compromise Can Feed an Army of AI Sex Bots

“Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing ”

Invisible Unicode text that AI chatbots understand and humans can’t? Yep, it’s a thing

“Does Style Matter? Disentangling Style and Substance in Chatbot Arena ”

Does style matter? Disentangling style and substance in Chatbot Arena :

View HTML:

/doc/www/lmsys.org/f378decdc51f1ed985c69386f92511c2898363c7.html

“Replacing My Right Hand With AI ”, Schluntz 2024

Replacing my Right Hand with AI :

View HTML:

/doc/www/erikschluntz.com/076e50f5dc692923bc072d387bd8f3911e9cad53.html

“System Prompts ”, Anthropic 2024

System Prompts :

View HTML:

/doc/www/docs.anthropic.com/e117d055c52d54ee6dfa9e3d029b0309ff59077a.html#july-12th-2024

“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”, Laine et al 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

“On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet] ”, Claude-3 2024

On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]

“Anthropic Claims Its Latest Model Is Best-In-Class ”, Wiggers 2024

Anthropic claims its latest model is best-in-class

“Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google ”, Knight 2024

Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google

“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI ”, Huang et al 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ”, Denison et al 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

“Are We Done With MMLU? ”, Gema et al 2024

Are We Done with MMLU?

“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”, Belouadi et al 2024

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”, Levy 2024

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

“SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, Yang et al 2024

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

“Analyzing Poems With LLMs ”, Toper 2024

Analyzing poems with LLMs

“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”, Zhang et al 2024

GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”, Liu et al 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

“PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks ”, Yax et al 2024

PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization ”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Long-Form Factuality in Large Language Models ”, Wei et al 2024

Long-form factuality in large language models

“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Srivastava et al 2024

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“Using Hallucinations to Bypass GPT-4’s Filter ”, Lemkin 2024

Using Hallucinations to Bypass GPT-4’s Filter

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ”

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

“EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”, Paech 2023

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild ”, Inie et al 2023

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation ”, Shah et al 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions ”, Kim et al 2023

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

“Specific versus General Principles for Constitutional AI ”, Kundu et al 2023

Specific versus General Principles for Constitutional AI

“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”, Chao et al 2023

PAIR: Jailbreaking Black Box Large Language Models in 20 Queries

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models ”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? ”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“When You Give a Claude a Mouse ”

When you give a Claude a mouse

“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book ”, Tanzer et al 2023

MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models ”, Heiding et al 2023

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models ”, Guha et al 2023

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

ESYudkowsky @ "2023-07-18"

Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.

“Question Decomposition Improves the Faithfulness of Model-Generated Reasoning ”, Radhakrishnan et al 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

“Lost in the Middle: How Language Models Use Long Contexts ”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts

“Understanding Social Reasoning in Language Models With Language Models ”, Gandhi et al 2023

Understanding Social Reasoning in Language Models with Language Models

“Opportunities and Risks of LLMs for Scalable Deliberation With Polis ”, Small et al 2023

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

“Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and O3 on Pokémon Red ”

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red :

View External Link:

https://www.lesswrong.com/posts/8aPyKyRrMAQatFSnG/untitled-draft-x7cc

“A Radical Plan to Make AI Good, Not Evil ”, Knight 2023

A Radical Plan to Make AI Good, Not Evil

“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”, Turpin et al 2023

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

“Constitutional AI: Harmlessness from AI Feedback ”, Bai et al 2022

Constitutional AI: Harmlessness from AI Feedback

“Context Distillation: Learning by Distilling Context ”, Snell et al 2022

Context Distillation: Learning by Distilling Context

“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”, Ganguli et al 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

“Training a Helpful and Harmless Assistant With Reinforcement Learning from Human Feedback ”, Bai et al 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

“A General Language Assistant As a Laboratory for Alignment ”, Askell et al 2021

A General Language Assistant as a Laboratory for Alignment

“The Perception of Rhythm in Language ”, Cutler 1994

The perception of rhythm in language

“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions] ”, Unikowsky 2025

In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]

“About Me ”

“How AI Models Stack Up Against My 11-Year-Old? ”

How AI Models Stack Up Against My 11-Year-Old?

“How I Use Claude ”

How I Use Claude

“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process ”

An amazing journey with Claude 3.5 and ChatGPT-4o who helped me backwards engineer an econometrics theory paper and taught me a lot more in the process

“Your AI Can’t See Gorillas ”, Gohel 2025

Your AI can’t see gorillas

“Janus ”

“`elimination_game`: A Multi-Player Tournament Benchmark That Tests LLMs in Social Reasoning, Strategy, & Deception. Players Engage in Public & Private Conversations, Form Alliances, & Vote to Eliminate Each Other ”, Mazur 2025

elimination_game: A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, & deception. Players engage in public & private conversations, form alliances, & vote to eliminate each other

“HN Wrapped: ‘Gwern’ [Claude Roast] ”

HN Wrapped: ‘Gwern’ [Claude roast]

“Claude, Read the Chevron PDF ”, Cowen & Claude-3 2025

Claude, read the Chevron PDF

“Claude Sonnet 3.5, Economist ”

Claude Sonnet 3.5, economist

“How Anthropic Built Artifacts ”, Orosz 2025

How Anthropic built Artifacts :

View HTML:

/doc/www/newsletter.pragmaticengineer.com/e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html

“SWE-Agent ”

“On Claude 3.5 Sonnet ”

On Claude 3.5 Sonnet

“Claude’s Dark Spiritual AI Futurism ”

Claude’s dark spiritual AI futurism

“European Parliament Revolutionizes Archive Access With Claude AI ”, Anthropic 2025

European Parliament Revolutionizes Archive Access with Claude AI

“Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku ”, Anthropic 2025

Introducing ‘computer use’, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

“Introducing Claude 3.5 ”

Introducing Claude 3.5

“Fine-Tune Claude 3 Haiku in Amazon Bedrock ”

Fine-tune Claude 3 Haiku in Amazon Bedrock :

View HTML:

/doc/www/www.anthropic.com/291a48ed6101368fdb8588cc0568979ce9db3e20.html

“Claude 3.5 Sonnet on GitHub Copilot ”

Claude 3.5 Sonnet on GitHub Copilot

“Introducing Citations on the Anthropic API ”

Introducing Citations on the Anthropic API

“Claude Can Now Search the Web ”, Anthropic 2025

Claude can now search the web

“Claude’s Character ”, Anthropic 2025

Claude’s Character :

View HTML:

/doc/www/www.anthropic.com/a9f33831747615fc9d619b346ca263844b243b61.html

“Developing a Computer Use Model ”, Anthropic 2025

Developing a computer use model

“Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions ”

Values in the wild: Discovering and analyzing values in real-world language model interactions

“How I Use Claude ”, Balwit 2025

How I Use Claude

“Websim, Worldsim, and The Summer of Simulative AI ”

Websim, Worldsim, and The Summer of Simulative AI

“The Hidden Cost of Our Lies to AI ”

The Hidden Cost of Our Lies to AI

“[Critical Thinking in Factchecking a Wikipedia Entry] ”, Marcello 2025

[Critical thinking in factchecking a Wikipedia entry]

“Claude Sonnet 3.7 (Often) Knows When It’s in Alignment Evaluations ”

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

“How Good Are LLMs at Doing ML on an Unknown Dataset? ”

How good are LLMs at doing ML on an unknown dataset?

“VDT: a Solution to Decision Theory ”

VDT: a solution to decision theory

“A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More ”

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

“A Three-Layer Model of LLM Psychology ”

A Three-Layer Model of LLM Psychology

“One Shockingly Impressive Capability of GPT-4.5 [Photo Geolocation] ”

One shockingly impressive capability of GPT-4.5 [photo geolocation]

“AI Will Increase the Quantity—And Quality—Of Phishing Scams ”

AI Will Increase the Quantity—and Quality—of Phishing Scams

“Claude Plays Pokemon ”

Claude Plays Pokemon

QiaochuYuan

[Claude jokes about itself] :

https://x.com/QiaochuYuan/status/1852831246482813336

Steve_Yegge

[on Claude Code] :

https://x.com/Steve_Yegge/status/1898674257808515242

elder_plinius

[Claude as AI Hitman] :

https://x.com/elder_plinius/status/1878946571565650264

repligate

Claude-3 base-model-like jailbreak :

https://x.com/repligate/status/1776041976653402508

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`llm-benchmarks language-reasoning agent-ethics context-awareness automated-systems llm-evaluation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`red-teaming`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`claude-performance`

[see previous entry]

[see previous entry]

Wikipedia (1)

Claude (language model) :

https://en.wikipedia.org/wiki/Claude_(language_model)

Miscellaneous

Bibliography

https://arxiv.org/abs/2503.21934: “Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”, Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev

link-bibliography
https://arxiv.org/abs/2501.15654: “People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”, Jenna Russell, Marzena Karpinska, Mohit Iyyer

link-bibliography
https://wiremodal.net/cwt: “Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Ben Reesor

link-bibliography
https://arxiv.org/abs/2411.13543: “BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”, Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

link-bibliography
https://arxiv.org/abs/2407.04694: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”, Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2405.15306: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”, Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger

link-bibliography
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”, Steven Levy

link-bibliography
https://arxiv.org/abs/2405.15793: “SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

link-bibliography
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”, Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

link-bibliography
https://arxiv.org/abs/2404.07544: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu

link-bibliography
https://arxiv.org/abs/2404.05955: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”, Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

link-bibliography
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models ”, Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2402.19450: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Saurabh Srivastava, Annarose M. B, Anto P. V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

link-bibliography
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

link-bibliography
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

link-bibliography
https://arxiv.org/abs/2312.06281: “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”, Samuel J. Paech

link-bibliography
https://arxiv.org/abs/2310.08419: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”, Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

link-bibliography
https://arxiv.org/abs/2308.12287: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models ”, Fredrik Heiding, Bruce Schneier, Arun Vishwanath, Jeremy Bernstein, Peter S. Park

link-bibliography
https://x.com/ESYudkowsky/status/1681442477994311681: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube. ”, Eliezer Yudkowsky

link-bibliography
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models ”, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

link-bibliography
https://www.wired.com/story/anthropic-ai-chatbots-ethics/: “A Radical Plan to Make AI Good, Not Evil ”, Will Knight

link-bibliography
https://arxiv.org/abs/2305.04388: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

link-bibliography
https://www.anthropic.com/red_teaming.pdf: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”, Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy L. Jones, Samuel R. Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom B. Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark

link-bibliography
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment ”, Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]