‘Codex’ directory

See Also
Gwern
- “latex2unicode.py ”, Gwern 2023
- “CQK Is The First Unused TLA ”, Gwern 2023
Links
Miscellaneous
Bibliography

See Also

Parent (‘GPT’ tag)

Gwern

“`latex2unicode.py` ”, Gwern 2023

latex2unicode.py

“CQK Is The First Unused TLA ”, Gwern 2023

CQK Is The First Unused TLA

Links

“Investigating Truthfulness in a Pre-Release GPT-O3 Model ”, Chowdhury et al 2025

Investigating truthfulness in a pre-release GPT-o3 model

“Anthropic Education Report: How University Students Use Claude ”, Anthropic 2025

Anthropic Education Report: How University Students Use Claude

“Analyzing Open-Source Bootloaders: Finding Vulnerabilities Faster With AI ”

Analyzing open-source bootloaders: Finding vulnerabilities faster with AI

“ByteCraft: Generating Video Games and Animations through Bytes ”

ByteCraft: Generating video games and animations through bytes :

View HTML:

https://emygervais.github.io/2025/03/15/bytecraft.html

“Introducing Mercury, the First Commercial-Scale Diffusion Large Language Model: We Trained Diffusion Large Language Models That Are up to 10× Faster & Cheaper Than Current LLMs, Pushing the Frontier of Intelligence & Speed for Language Models ”, Labs 2025

Introducing Mercury, the first commercial-scale diffusion large language model: We trained diffusion large language models that are up to 10× faster & cheaper than current LLMs, pushing the frontier of intelligence & speed for language models

“OpenAI Uncovers Evidence of AI-Powered Chinese Surveillance Tool ”

OpenAI Uncovers Evidence of AI-Powered Chinese Surveillance Tool

“Measuring Automated Kernel Engineering ”

Measuring Automated Kernel Engineering

“Competitive Programming With Large Reasoning Models ”, El-Kishky et al 2025

Competitive Programming with Large Reasoning Models

“Building Personal Software With Claude ”, Elhage 2025

Building personal software with Claude :

View HTML:

/doc/www/blog.nelhage.com/7b7c29617419e040d145eaeb19bd1855d5d99d71.html

“Thoughts On A Month With Devin ”

Thoughts On A Month With Devin :

View HTML:

/doc/www/www.answer.ai/c2cee15e4e71db3a6c343a7caf25868b20c5ad1c.html

“How I Program With LLMs ”, Crawshaw 2025

How I program with LLMs :

View HTML:

/doc/www/crawshaw.io/b17174c846e87409ec8e3949572a414daa526c2a.html

“Can LLMs Write Better Code If You Keep Asking Them to ‘Write Better Code’? ”

Can LLMs write better code if you keep asking them to ‘write better code’? :

View External Link:

https://minimaxir.com/2025/01/write-better-code/

“Things We Learned about LLMs in 2024 ”

Things we learned about LLMs in 2024

“Performance of LLMs on Advent of Code 2024 ”, Pinto 2024

Performance of LLMs on Advent of Code 2024

“LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4) ”, Korinek 2024

LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4)

“Amplifying Human Performance in Combinatorial Competitive Programming ”, Veličković et al 2024

Amplifying human performance in combinatorial competitive programming

“They All Use It ”, Ball 2024

They all use it

“Business Spending on AI Surged 500% This Year to $13.8 Billion ”

Business spending on AI surged 500% this year to $13.8 billion

“Alphabet Q3 Earnings Call: CEO Sundar Pichai’s Remarks ”

Alphabet Q3 earnings call: CEO Sundar Pichai’s remarks

“Hacking Back the AI-Hacker: Prompt Injection As a Defense Against LLM-Driven Cyberattacks ”, Pasquini et al 2024

Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

“A Tutorial on Teaching Data Analytics With Generative AI ”, Bray 2024

A Tutorial on Teaching Data Analytics with Generative AI

“AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ”, Andriushchenko et al 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

“SWE-Bench+: Enhanced Coding Benchmark for LLMs ”, Aleithan et al 2024

SWE-Bench+: Enhanced Coding Benchmark for LLMs

“MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering ”, Chan et al 2024

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

“Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code ”

Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code

“Evaluation of OpenAI O1: Opportunities and Challenges of AGI ”, Zhong et al 2024

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

“Language Models Learn to Mislead Humans via RLHF ”, Wen et al 2024

Language Models Learn to Mislead Humans via RLHF

“Using ChatGPT to Reverse Engineer Minified JavaScript ”

Using ChatGPT to reverse engineer minified JavaScript :

View HTML:

/doc/www/glama.ai/89d194a9bd2f95cb5e035f371810f20842f2f652.html

“SWE-Bench Technical Report: 22% ”, Honeycomb 2024

SWE-Bench Technical Report: 22%

“AI-Powered Coding Pulls in Almost $1bn of Funding to Claim ‘Killer App’ Status ”, Murgia 2024

AI-powered coding pulls in almost $1bn of funding to claim ‘killer app’ status

“Prompt Injection in ‘Resolve Vulnerabilty’ Results in Arbitrary Command Execution in Victim’s Pipeline ”, GitLab 2024

Prompt injection in ‘Resolve Vulnerabilty’ results in arbitrary command execution in victim’s pipeline

“To Code, or Not To Code? Exploring Impact of Code in Pre-Training ”, Aryabumi et al 2024

To Code, or Not To Code? Exploring Impact of Code in Pre-training

“Replacing My Right Hand With AI ”, Schluntz 2024

Replacing my Right Hand with AI :

View HTML:

/doc/www/erikschluntz.com/076e50f5dc692923bc072d387bd8f3911e9cad53.html

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

“DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence ”, Zhu et al 2024

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

“Diffusion On Syntax Trees For Program Synthesis ”, Kapur et al 2024

Diffusion On Syntax Trees For Program Synthesis

“LoRA Learns Less and Forgets Less ”, Biderman et al 2024

LoRA Learns Less and Forgets Less

“SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, Yang et al 2024

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

“A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy ”, Jin 2024

A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round could increase startup’s valuation nearly sixfold in a matter of weeks, reflecting AI frenzy

“Vulnerability Detection With Code Language Models: How Far Are We? ”, Ding et al 2024

Vulnerability Detection with Code Language Models: How Far Are We?

“Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game ”, Vance 2024

Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game

“TestGen-LLM: Automated Unit Test Improvement Using Large Language Models at Meta ”, Alshahwan et al 2024

TestGen-LLM: Automated Unit Test Improvement using Large Language Models at Meta

“The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot Within a Corporate Environment ”, Chatterjee et al 2024

The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot Within a Corporate Environment

“CodeIt: Self-Improving Language Models With Prioritized Hindsight Replay ”, Butt et al 2024

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

“Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024 ”, Harding & Kloster 2024

Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

“Leveraging Large Language Models to Boost Dafny’s Developers Productivity ”, Silva et al 2024

Leveraging Large Language Models to Boost Dafny’s Developers Productivity

“WaveCoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation ”, Yu et al 2023

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

“StarVector: Generating Scalable Vector Graphics Code from Images ”, Rodriguez et al 2023

StarVector: Generating Scalable Vector Graphics Code from Images

“Universal Self-Consistency for Large Language Model Generation ”, Chen et al 2023

Universal Self-Consistency for Large Language Model Generation

“LLM-Assisted Code Cleaning For Training Accurate Code Generators ”, Jain et al 2023

LLM-Assisted Code Cleaning For Training Accurate Code Generators

“A Coder Considers the Waning Days of the Craft: Coding Has Always Felt to Me like an Endlessly Deep and Rich Domain. Now I Find Myself Wanting to Write a Eulogy for It ”, Somers 2023

A Coder Considers the Waning Days of the Craft: Coding has always felt to me like an endlessly deep and rich domain. Now I find myself wanting to write a eulogy for it

“ChipNeMo: Domain-Adapted LLMs for Chip Design ”, Liu et al 2023

ChipNeMo: Domain-Adapted LLMs for Chip Design

“CodeFusion: A Pre-Trained Diffusion Model for Code Generation ”, Singh et al 2023

CodeFusion: A Pre-trained Diffusion Model for Code Generation

“Eureka: Human-Level Reward Design via Coding Large Language Models ”, Ma et al 2023

Eureka: Human-Level Reward Design via Coding Large Language Models

“Data Contamination Through the Lens of Time ”, Roberts et al 2023

Data Contamination Through the Lens of Time

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? ”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ”, Zhou et al 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

“PassUntil: Predicting Emergent Abilities With Infinite Resolution Evaluation ”, Hu et al 2023

PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation

“Security Weaknesses of Copilot Generated Code in GitHub ”, Fu et al 2023

Security Weaknesses of Copilot Generated Code in GitHub

“Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification ”, Zhou et al 2023

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

“OctoPack: Instruction Tuning Code Large Language Models ”, Muennighoff et al 2023

OctoPack: Instruction Tuning Code Large Language Models

“Testing GPT-4 With Wolfram Alpha and Code Interpreter Plug-Ins on Math and Science Problems ”, Davis & Aaronson 2023

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

“Insights into Stack Overflow’s Traffic: We’re Setting the Record Straight ”, Darilek 2023

Insights into Stack Overflow’s traffic: We’re setting the record straight

“Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow ”, Rio-Chanona et al 2023

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

“Explaining Competitive-Level Programming Solutions Using LLMs ”, Li et al 2023

Explaining Competitive-Level Programming Solutions using LLMs

“Large Language Models for Supply Chain Optimization ”, Li et al 2023

Large Language Models for Supply Chain Optimization

“InterCode: Standardizing and Benchmarking Interactive Coding With Execution Feedback ”, Yang et al 2023

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

“AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—And Not Going Anywhere ”, Dzieza 2023

AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere

“When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF) ”, Mozannar et al 2023

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)

“CodeCompose: A Large-Scale Industrial Deployment of AI-Assisted Code Authoring ”, Murali et al 2023

CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring

“Chatting With GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing ”, Liu et al 2023

Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing

“Large Language Model Programs ”, Schlag et al 2023

Large Language Model Programs

“StarCoder: May the Source Be With You! ”, Li et al 2023

StarCoder: may the source be with you!

“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding ”, Xie et al 2023

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

“LLM+P: Empowering Large Language Models With Optimal Planning Proficiency ”, Liu et al 2023

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

“Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes ”, Arora et al 2023

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

“How Secure Is Code Generated by ChatGPT? ”, Khoury et al 2023

How Secure is Code Generated by ChatGPT?

“Today Was the First Day That I Could Definitively Say That GPT-4 Has Saved Me a Substantial Amount of Tedious Work ”, Tao 2023

Today was the first day that I could definitively say that GPT-4 has saved me a substantial amount of tedious work

“Language Models Can Solve Computer Tasks ”, Kim et al 2023

Language Models can Solve Computer Tasks

“Reflexion: Language Agents With Verbal Reinforcement Learning ”, Shinn et al 2023

Reflexion: Language Agents with Verbal Reinforcement Learning

“Large Language Models and Simple, Stupid Bugs ”, Jesse et al 2023

Large Language Models and Simple, Stupid Bugs

“Introducing Microsoft 365 Copilot—Your Copilot for Work ”, Spataro 2023

Introducing Microsoft 365 Copilot—your copilot for work

“Larger Language Models Do In-Context Learning Differently ”, Wei et al 2023

Larger language models do in-context learning differently

“ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics ”, Azerbayev et al 2023

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

“CodeBERTScore: Evaluating Code Generation With Pretrained Models of Code ”, Zhou et al 2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

“Faithful Chain-Of-Thought Reasoning ”, Lyu et al 2023

Faithful Chain-of-Thought Reasoning

“Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-Based Reasoning ”, Ye et al 2023

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

“Google Is Asking Employees to Test Potential ChatGPT Competitors, including a Chatbot Called 'Apprentice Bard' ”, Elias 2023

Google is asking employees to test potential ChatGPT competitors, including a chatbot called 'Apprentice Bard'

“An Analysis of the Automatic Bug Fixing Performance of ChatGPT ”, Sobania et al 2023

An Analysis of the Automatic Bug Fixing Performance of ChatGPT

“Connor Leahy on Aliens, Ethics, Economics, Memetics, and Education § GPT-4 ”, Leahy 2023

Connor Leahy on Aliens, Ethics, Economics, Memetics, and Education § GPT-4

“General Availability of Azure OpenAI Service Expands Access to Large, Advanced AI Models With Added Enterprise Benefits ”, Boyd 2023

General availability of Azure OpenAI Service expands access to large, advanced AI models with added enterprise benefits

“SantaCoder: Don’t Reach for the Stars! ”, Allal et al 2023

SantaCoder: don’t reach for the stars!

“TrojanPuzzle: Covertly Poisoning Code-Suggestion Models ”, Aghakhani et al 2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

“ERNIE-Code: Beyond English-Centric Cross-Lingual Pretraining for Programming Languages ”, Chai et al 2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

“The Stack: 3 TB of Permissively Licensed Source Code ”, Kocetkov et al 2022

The Stack: 3 TB of permissively licensed source code

“PAL: Program-Aided Language Models ”, Gao et al 2022

PAL: Program-aided Language Models

“Do Users Write More Insecure Code With AI Assistants? ”, Perry et al 2022

Do Users Write More Insecure Code with AI Assistants?

“Broken Neural Scaling Laws ”, Caballero et al 2022

Broken Neural Scaling Laws

“Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work ”, Hoffman & Scott 2022

Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work

“Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them ”, Suzgun et al 2022

Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

“Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners ”, Su et al 2022

Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

“Repair Is Nearly Generation: Multilingual Program Repair With LLMs ”, Joshi et al 2022

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

“Limitations of Language Models in Arithmetic and Symbolic Induction ”, Qian et al 2022

Limitations of Language Models in Arithmetic and Symbolic Induction

“Language Models Can Teach Themselves to Program Better ”, Haluptzok et al 2022

Language Models Can Teach Themselves to Program Better

“PanGu-Coder: Program Synthesis With Function-Level Language Modeling ”, Christopoulou et al 2022

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

“CodeT: Code Generation With Generated Tests ”, Chen et al 2022

CodeT: Code Generation with Generated Tests

“Can Large Language Models Reason about Medical Questions? ”, Liévin et al 2022

Can large language models reason about medical questions?

“Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code ”, Volum et al 2022

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

“Code Translation With Compiler Representations ”, Szafraniec et al 2022

Code Translation with Compiler Representations

“Repository-Level Prompt Generation for Large Language Models of Code ”, Shrivastava et al 2022

Repository-Level Prompt Generation for Large Language Models of Code

“Learning to Model Editing Processes ”, Reid & Neubig 2022

Learning to Model Editing Processes

“Productivity Assessment of Neural Code Completion ”, Ziegler et al 2022

Productivity Assessment of Neural Code Completion

“End-To-End Symbolic Regression With Transformers ”, Kamienny et al 2022

End-to-end symbolic regression with transformers

“InCoder: A Generative Model for Code Infilling and Synthesis ”, Fried et al 2022

InCoder: A Generative Model for Code Infilling and Synthesis

“PaLM: Scaling Language Modeling With Pathways ”, Chowdhery et al 2022

PaLM: Scaling Language Modeling with Pathways

“A Conversational Paradigm for Program Synthesis ”, Nijkamp et al 2022

A Conversational Paradigm for Program Synthesis

“Evaluating the Text-To-SQL Capabilities of Large Language Models ”, Rajkumar et al 2022

Evaluating the Text-to-SQL Capabilities of Large Language Models

“Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models ”, Vaithilingam et al 2022

Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

“PolyCoder: A Systematic Evaluation of Large Language Models of Code ”, Xu et al 2022

PolyCoder: A Systematic Evaluation of Large Language Models of Code

“Pop Quiz! Can a Large Language Model Help With Reverse Engineering? ”, Pearce et al 2022

Pop Quiz! Can a Large Language Model Help With Reverse Engineering?

“Text and Code Embeddings by Contrastive Pre-Training ”, Neelakantan et al 2022

Text and Code Embeddings by Contrastive Pre-Training

“Neural Language Models Are Effective Plagiarists ”, Biderman & Raff 2022

Neural Language Models are Effective Plagiarists

“Deep Symbolic Regression for Recurrent Sequences ”, d’Ascoli et al 2022

Deep Symbolic Regression for Recurrent Sequences

“Discovering the Syntax and Strategies of Natural Language Programming With Generative Language Models ”, Jiang et al 2022

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More ”, Drori et al 2021

A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More

“Few-Shot Semantic Parsing With Language Models Trained On Code ”, Shin & Durme 2021

Few-Shot Semantic Parsing with Language Models Trained On Code

“WebGPT: Browser-Assisted Question-Answering With Human Feedback ”, Nakano et al 2021

WebGPT: Browser-assisted question-answering with human feedback

“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing ”, Hilton et al 2021

WebGPT: Improving the factual accuracy of language models through web browsing

“Scaling Language Models: Methods, Analysis & Insights from Training Gopher ”, Rae et al 2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

“Jigsaw: Large Language Models Meet Program Synthesis ”, Jain et al 2021

Jigsaw: Large Language Models meet Program Synthesis

“Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts? ”, Zhang et al 2021

Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?

“Solving Linear Algebra by Program Synthesis ”, Drori & Verma 2021

Solving Linear Algebra by Program Synthesis

“Solving Probability and Statistics Problems by Program Synthesis ”, Tang et al 2021

Solving Probability and Statistics Problems by Program Synthesis

“Automatic Program Repair With OpenAI’s Codex: Evaluating QuixBugs ”, Prenner & Robbes 2021

Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs

“GenLine and GenForm: Two Tools for Interacting With Generative Language Models in a Code Editor ”, Jiang et al 2021b

GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor

“An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions ”, Pearce et al 2021

An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions

“Learning C to X86 Translation: An Experiment in Neural Compilation ”, Armengol-Estapé & O’Boyle 2021

Learning C to x86 Translation: An Experiment in Neural Compilation

“Program Synthesis With Large Language Models ”, Austin et al 2021

Program Synthesis with Large Language Models

“TAPEX: Table Pre-Training via Learning a Neural SQL Executor ”, Liu et al 2021

TAPEX: Table Pre-training via Learning a Neural SQL Executor

“Evaluating Large Language Models Trained on Code ”, Chen et al 2021

Evaluating Large Language Models Trained on Code

“Research Recitation: A First Look at Rote Learning in GitHub Copilot Suggestions ”, Ziegler 2021

Research recitation: A first look at rote learning in GitHub Copilot suggestions

“Microsoft and OpenAI Have a New AI Tool That Will Give Coding Suggestions to Software Developers ”, Novet 2021

Microsoft and OpenAI have a new AI tool that will give coding suggestions to software developers

“SymbolicGPT: A Generative Transformer Model for Symbolic Regression ”, Valipour et al 2021

SymbolicGPT: A Generative Transformer Model for Symbolic Regression

“Measuring Coding Challenge Competence With APPS ”, Hendrycks et al 2021

Measuring Coding Challenge Competence With APPS

“Improving Code Autocompletion With Transfer Learning ”, Zhou et al 2021

Improving Code Autocompletion with Transfer Learning

“LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning ”, Wu et al 2021

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

“Learning Autocompletion from Real-World Datasets ”, Aye et al 2020

Learning Autocompletion from Real-World Datasets

“GraphCodeBERT: Pre-Training Code Representations With Data Flow ”, Guo et al 2020

GraphCodeBERT: Pre-training Code Representations with Data Flow

“CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair ”, Lutellier et al 2020

CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair

“TransCoder: Unsupervised Translation of Programming Languages ”, Lachaux et al 2020

TransCoder: Unsupervised Translation of Programming Languages

“GPT-3 Random Sample Dump: JavaScript Tutorial ”, GPT-3 2020

GPT-3 random sample dump: JavaScript tutorial

“IJON: Exploring Deep State Spaces via Fuzzing ”, Aschermann et al 2020

IJON: Exploring Deep State Spaces via Fuzzing

“IntelliCode Compose: Code Generation Using Transformer ”, Svyatkovskiy et al 2020

IntelliCode Compose: Code Generation Using Transformer

“Deep Learning for Symbolic Mathematics ”, Lample & Charton 2019

Deep Learning for Symbolic Mathematics

“CodeSearchNet Challenge: Evaluating the State of Semantic Code Search ”, Husain et al 2019

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

“BERTScore: Evaluating Text Generation With BERT ”, Zhang et al 2019

BERTScore: Evaluating Text Generation with BERT

“Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning ”, Zhong et al 2017

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

“Learning to Superoptimize Programs ”, Bunel et al 2017

Learning to superoptimize programs

“DeepCoder: Learning to Write Programs ”, Balog et al 2016

DeepCoder: Learning to Write Programs

“Neural Programmer-Interpreters ”, Reed & Freitas 2015

Neural Programmer-Interpreters

“Autocomplete As an Interface ”, Kuhn 2015

Autocomplete as an interface

“Computers Doing The Right Thing ”

Computers Doing The Right Thing

“OpenAI API Alchemy: Smart Formatting and Code Creation ”

OpenAI API Alchemy: Smart Formatting and Code Creation :

View External Link:

https://andrewmayne.com/2020/06/13/openai-api-alchemy-smart-formatting-and-code-creation/

“Building Games and Apps Entirely through Natural Language Using OpenAI’s Code-Davinci Model ”

Building games and apps entirely through natural language using OpenAI’s code-davinci model :

View External Link:

https://andrewmayne.com/2022/03/17/building-games-and-apps-entirely-through-natural-language-using-openais-davinci-code-model/

“Replit ”

Replit :

View HTML:

/doc/www/blog.replit.com/c111945e461baafb3b10187dc65b4ff4256530c4.html

“Working With AI (Part 2): Code Conversion ”

Working with AI (Part 2): Code Conversion :

View HTML:

/doc/www/blog.withmantle.com/fd94ca950274977e4321f54a45033143e8b87efc.html

“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process ”

An amazing journey with Claude 3.5 and ChatGPT-4o who helped me backwards engineer an econometrics theory paper and taught me a lot more in the process

“Your AI Can’t See Gorillas ”, Gohel 2025

Your AI can’t see gorillas

“Inside the CodeBot: A Gentle Introduction to How LLMs Understand Nullability ”

Inside the CodeBot: A Gentle Introduction to How LLMs Understand Nullability

“GitHub Copilot: The Agent Awakens ”

GitHub Copilot: The agent awakens :

View HTML:

/doc/www/github.blog/fce4e07b37b1e10ba900ae91ce36bc21199b4c56.html

“StenographyDev/autopilot-Vsc ”

StenographyDev/autopilot-vsc

“`gptel`: A Simple LLM Client for Emacs ”, karthink 2025

gptel: A simple LLM client for Emacs :

View HTML (16MB):

/doc/www/github.com/6481e750767bf4d4a0e577f388b3421e836cb123.html

“Openai/codex: Lightweight Coding Agent That Runs in Your Terminal ”

openai/codex: Lightweight coding agent that runs in your terminal

“Issue #445: ‘Continuous Meltdown” Text Loop After Failed to Parse `toolCall.arguments` Before “Your Input Exceeds the Context Window of This Model.’ ”

Issue #445: ‘continuous meltdown” text loop after failed to parse toolCall.arguments before “Your input exceeds the context window of this model.’

“Copilot Stops Working on `gender` Related Subjects • Community • Discussion #72603 ”

Copilot stops working on `gender` related subjects • community • Discussion #72603 :

View HTML:

/doc/www/github.com/240b757ca122975adc355feffb57df79223bfa90.html

“Pen.el ”, semiosis 2025

“Revolutionize Your Project Documentation With the Codex-README Generator, Utilizing OpenAI’s Codex for Intelligent README Creation. ”

Revolutionize your project documentation with the Codex-README generator, utilizing OpenAI’s Codex for intelligent README creation.

“LLM Powered Autonomous Agents ”

LLM Powered Autonomous Agents

“The RetroInstruct Guide To Synthetic Text Data ”, Pressman 2025

The RetroInstruct Guide To Synthetic Text Data

“Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B ”

Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B :

View HTML:

/doc/www/minimaxir.com/3eaae6137c3ff81ee1ce8b508282148db4e60799.html

“There’s a Running Theme in Here of Programming Problems LLMs Solve Where It’s… ”

There’s a running theme in here of programming problems LLMs solve where it’s… :

View HTML:

/doc/www/news.ycombinator.com/85525c9bb48f9c95680601ccae4284f2c576e93b.html

“How Anthropic Built Artifacts ”, Orosz 2025

How Anthropic built Artifacts :

View HTML:

/doc/www/newsletter.pragmaticengineer.com/e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html

“How I Use ‘AI’ ”, Carlini 2025

How I Use ‘AI’

“@jeremy-Berman/arc-Agi on Params ”

@jeremy-berman/arc-agi on Params :

View HTML:

/doc/www/params.com/ffadd930691cc27e734738d2c8d6068b05122a61.html

“Using GPT-3 to Explain How Code Works ”

Using GPT-3 to explain how code works :

View HTML:

/doc/www/simonwillison.net/3a6b69f320048c5f35bf8a02af29ed8831a0ace6.html

“SWE-Agent ”

“Adept Video Demo! ”

Adept Video Demo!

“Transformer-VAE for Program Synthesis ”

Transformer-VAE for Program Synthesis

“Writer ”

Writer :

View HTML:

/doc/www/writer.mintlify.com/2ac5f030b3a38c813fbeb999b44a83b650ff3f66.html

“Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku ”, Anthropic 2025

Introducing ‘computer use’, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

“Claude 3.5 Sonnet on GitHub Copilot ”

Claude 3.5 Sonnet on GitHub Copilot

“Developing a Computer Use Model ”, Anthropic 2025

Developing a computer use model

“GPT-4 O1 Isn’t a Chat Model (And That’s the Point) ”

GPT-4 o1 isn’t a chat model (and that’s the point)

“Websim, Worldsim, and The Summer of Simulative AI ”

Websim, Worldsim, and The Summer of Simulative AI

“I Found >800 Orthogonal ‘Write Code’ Steering Vectors ”

I found >800 orthogonal ‘write code’ steering vectors :

View HTML:

/doc/www/www.greaterwrong.com/441e2c82f2dbe90699728ce7f7fefd27ae4f2a0e.html

“Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability ”

Who models the models that model models? An exploration of GPT-3’s in-context model fitting ability

“OpenAI Codex: First Impressions ”

OpenAI Codex: First Impressions :

View External Link:

https://www.lesswrong.com/posts/ib9bfyJiz4FLuHDQs/openai-codex-first-impressions

“One-Shot Steering Vectors Cause Emergent Misalignment, Too ”

One-shot steering vectors cause emergent misalignment, too

“A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans. ”

A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.

“Balloons! The Balloon Clicker Game ”

Balloons! The Balloon Clicker Game :

View HTML:

/doc/www/www.shawnmatthewcrawford.com/a99f236d1b9ce4d4f5da9a22ac2ac991c8be1f99.html

“Tabnine AI Code Assistant ”

Tabnine AI code assistant

“OpenAI Can Translate English into Code With Its New Machine Learning Software Codex ”

OpenAI can translate English into code with its new machine learning software Codex

“FROM PLAIN TO EXPLAINED IN FIVE MINUTES: Getting Started With Stenography Autopilot ”

FROM PLAIN TO EXPLAINED IN FIVE MINUTES: Getting Started with Stenography Autopilot :

https://www.youtube.com/watch?v=8sDsMUUcrtM

“OpenAI Codex Live Demo ”

OpenAI Codex Live Demo :

https://www.youtube.com/watch?v=SGUCcjHTmGY#openai

“Is Finetuning GPT-4o worth It? ”

Is finetuning GPT-4o worth it? :

https://www.youtube.com/watch?v=X57GT1Y5URY

“Creating a Space Game With OpenAI Codex ”

Creating a Space Game with OpenAI Codex :

https://www.youtube.com/watch?v=Zm9B-DvwOgw

Steve_Yegge

[on Claude Code] :

https://x.com/Steve_Yegge/status/1898674257808515242

jsngr

This changes everything. :Exploding_head: With GPT-3, I built a Figma plugin to design for you. I call it ‘Designer’ :

/doc/www/localhost/9d3a361c20444479827b840c27dcdc8dd1b85c7d.html

nutanc

Starting the day with a chart building demo. Primed GPT-3 with Chart.js scripts to generate the below. :

/doc/www/localhost/084a08d53754dfbfa02255390637ea313f9bf91f.html

sharifshameem

I just built a functioning React app by describing what I wanted to GPT-3. I’m still in awe. :

/doc/www/localhost/8105e85cb24abc78ecc132632b533a96a08d3857.html

sharifshameem

I built a todo list app simply by describing it to GPT-3. It generated the React code for a fully functioning app within seconds. I’m becoming more impressed and aware of its capabilities every single day. :

/doc/www/localhost/136410ed7e7d229ae21be77dbbaaa60bda84dde7.html

sharifshameem

I gave GPT-3 access to Chrome with the objective ‘please buy me AirPods’…It successfully made it to the product page, but got sidetracked with Walmart’s privacy policy. Since even a simplified DOM is far too large for a single prompt, multiple prompts are given different chunks of the DOM, each generating their own ‘interaction’. Another prompt then takes all the proposed interactions and selects the best one, sort of like a tournament bracket. For more complex web pages, the time it takes to generate an action scales at 𝒪(log n) with the size of the DOM—really fast! It also gets around token limits, so you could technically process an infinitely large DOM! :

/doc/www/localhost/dd47e242f08c9f09a5b9923ac0f94804f7419184.html

spolu

The examples are indeed extremely simple on purpose (otherwise it’s hard to communicate efficiently what’s happening to non-Metamath experts). That being said, we’re still pretty far away from IMOs; but this is definitely a goal for us, and one we’re actively working towards!

“XBOW Now Matches the Capabilities of a Top Human Pentester ”, XBOW 2025

XBOW now matches the capabilities of a top human pentester :

View HTML:

/doc/www/xbow.com/6488c9703734a04ed02d9d7e6094a6df83b55484.html

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`vulnerability-detection`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`code-generation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia (2)

GitHub Copilot
OpenAI Codex :

https://en.wikipedia.org/wiki/OpenAI_Codex

Miscellaneous

Bibliography

https://arxiv.org/abs/2502.06807#openai: “Competitive Programming With Large Reasoning Models ”, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Łukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou

link-bibliography
https://registerspill.thorstenball.com/p/they-all-use-it: “They All Use It ”, Thorsten Ball

link-bibliography
https://arxiv.org/abs/2410.06992: “SWE-Bench+: Enhanced Coding Benchmark for LLMs ”, Reem Aleithan, Haoran Xue, Mohammad Mahdi Mohajer, Elijah Nnorom, Gias Uddin, Song Wang

link-bibliography
https://arxiv.org/abs/2410.07095#openai: “MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering ”, Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry

link-bibliography
https://www.ft.com/content/4868bd38-613c-4fa9-ba9d-1ed8fa8a40c8: “AI-Powered Coding Pulls in Almost $1bn of Funding to Claim ‘Killer App’ Status ”, Madhumita Murgia

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2405.15793: “SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

link-bibliography
https://www.wsj.com/tech/ai/a-peter-thiel-backed-ai-startup-cognition-labs-seeks-2-billion-valuation-998fa39d: “A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy ”, Berber Jin

link-bibliography
https://arxiv.org/abs/2403.18624: “Vulnerability Detection With Code Language Models: How Far Are We? ”, Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen

link-bibliography
https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant: “Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game ”, Ashlee Vance

link-bibliography
2024-harding.pdf: “Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024 ”, William Harding, Matthew Kloster

link-bibliography
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

link-bibliography
https://arxiv.org/abs/2312.11556: “StarVector: Generating Scalable Vector Graphics Code from Images ”, Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli

link-bibliography
https://arxiv.org/abs/2310.04406: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

link-bibliography
https://arxiv.org/abs/2310.03262: “PassUntil: Predicting Emergent Abilities With Infinite Resolution Evaluation ”, Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

link-bibliography
https://arxiv.org/abs/2310.02059: “Security Weaknesses of Copilot Generated Code in GitHub ”, Yujia Fu, Peng Liang, Amjed Tahir, Zengyang Li, Mojtaba Shahin, Jiaxin Yu, Jinfu Chen

link-bibliography
https://arxiv.org/abs/2308.07921: “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification ”, Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li

link-bibliography
https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots: “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—And Not Going Anywhere ”, Josh Dzieza

link-bibliography
https://arxiv.org/abs/2306.04930#microsoft: “When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF) ”, Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz

link-bibliography
https://arxiv.org/abs/2303.11455: “Large Language Models and Simple, Stupid Bugs ”, Kevin Jesse, Toufique Ahmed, Premkumar T. Devanbu, Emily Morgan

link-bibliography
https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/: “Introducing Microsoft 365 Copilot—Your Copilot for Work ”, Jared Spataro

link-bibliography
https://arxiv.org/abs/2303.03846#google: “Larger Language Models Do In-Context Learning Differently ”, Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

link-bibliography
https://arxiv.org/abs/2302.12433: “ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics ”, Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev, Jeremy Avigad

link-bibliography
https://www.cnbc.com/2023/01/31/google-testing-chatgpt-like-chatbot-apprentice-bard-with-employees.html: “Google Is Asking Employees to Test Potential ChatGPT Competitors, including a Chatbot Called 'Apprentice Bard' ”, Jennifer Elias

link-bibliography
https://arxiv.org/abs/2301.08653: “An Analysis of the Automatic Bug Fixing Performance of ChatGPT ”, Dominik Sobania, Martin Briesch, Carol Hanna, Justyna Petke

link-bibliography
https://azure.microsoft.com/en-us/blog/general-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits/: “General Availability of Azure OpenAI Service Expands Access to Large, Advanced AI Models With Added Enterprise Benefits ”, Eric Boyd

link-bibliography
https://arxiv.org/abs/2211.15533: “The Stack: 3 TB of Permissively Licensed Source Code ”, Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, Harm de Vries

link-bibliography
https://greylock.com/greymatter/kevin-scott-ai-programming-possibility/: “Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work ”, Reid Hoffman, Kevin Scott

link-bibliography
https://arxiv.org/abs/2210.09261#google: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them ”, Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

link-bibliography
https://arxiv.org/abs/2209.01975: “Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners ”, Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah Smith, Tao Yu

link-bibliography
https://arxiv.org/abs/2207.08143: “Can Large Language Models Reason about Medical Questions? ”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther

link-bibliography
https://arxiv.org/abs/2205.06537#github: “Productivity Assessment of Neural Code Completion ”, Albert Ziegler, Eirini Kalliamvakou, Shawn Simister, Ganesh Sittampalam, Alice Li, Andrew Rice, Devon Rifkin, Edward Aftandilian

link-bibliography
https://arxiv.org/abs/2204.05999#facebook: “InCoder: A Generative Model for Code Infilling and Synthesis ”, Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

link-bibliography
https://arxiv.org/abs/2204.02311#google: “PaLM: Scaling Language Modeling With Pathways ”, Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

link-bibliography
2022-vaithilingam.pdf: “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models ”, Priyan Vaithilingam, Tianyi Zhang, Elena Glassman

link-bibliography
https://arxiv.org/abs/2201.10005#openai: “Text and Code Embeddings by Contrastive Pre-Training ”, Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng

link-bibliography
https://arxiv.org/abs/2112.15594: “A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More ”, Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang

link-bibliography
https://arxiv.org/abs/2112.09332#openai: “WebGPT: Browser-Assisted Question-Answering With Human Feedback ”, Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

link-bibliography
https://openai.com/research/webgpt: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing ”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman

link-bibliography
https://arxiv.org/abs/2112.11446#deepmind: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher ”, Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

link-bibliography
https://arxiv.org/abs/2111.11904#microsoft: “Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts? ”, Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, Shuvendu K. Lahiri

link-bibliography
https://arxiv.org/abs/2111.08267: “Solving Probability and Statistics Problems by Program Synthesis ”, Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma, Iddo Drori

link-bibliography
2021-jiang-2.pdf: “GenLine and GenForm: Two Tools for Interacting With Generative Language Models in a Code Editor ”, Ellen Jiang, Edwin Toh, Alejandra Molina, Aaron Donsbach, Carrie Cai, Michael Terry

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]