‘NN’ directory

Gwern

‘NN’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

Gwern

“Regexps Used to Be AI ”, Gwern 2024

Regexps used to be AI

“Research Ideas ”, Gwern 2017

Research Ideas

“The Neural Net Tank Urban Legend ”, Gwern 2011

The Neural Net Tank Urban Legend

“Surprisingly Turing-Complete ”, Gwern 2012

Surprisingly Turing-Complete

“Evolution As Backstop for Reinforcement Learning ”, Gwern 2018

Evolution as Backstop for Reinforcement Learning

“ARPA and SCI: Surfing AI ”, Gwern 2018

ARPA and SCI: Surfing AI

“Computer Optimization: Your Computer Is Faster Than You Think ”, Gwern 2021

Computer Optimization: Your Computer Is Faster Than You Think

“Timing Technology: Lessons From The Media Lab ”, Gwern 2012

Timing Technology: Lessons From The Media Lab

Links

“Subtitling Your Life: Hearing Aids and Cochlear Implants Have Been Getting Better for Years, but a New Type of Device—Eyeglasses That Display Real-Time Speech Transcription on Their Lenses—Are a Game-Changing Breakthrough ”, Owen 2025

Subtitling Your Life: Hearing aids and cochlear implants have been getting better for years, but a new type of device—eyeglasses that display real-time speech transcription on their lenses—are a game-changing breakthrough

“Covering Cracks in Content Moderation: Delexicalized Distant Supervision for Illicit Drug Jargon Detection ”, Song et al 2025

Covering Cracks in Content Moderation: Delexicalized Distant Supervision for Illicit Drug Jargon Detection

“Estimating the Probability of Sampling a Trained Neural Network at Random ”, Scherlis & Belrose 2025

Estimating the Probability of Sampling a Trained Neural Network at Random

“Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible With Cryptography ”, Shumailov et al 2025

Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography

“Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering ”, Chaubard et al 2024

Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

“Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World ”, Kazdan et al 2024

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

“Why Concepts Are (Probably) Vectors ”, Piantadosi et al 2024

Why concepts are (probably) vectors

“Robin Hanson: Prediction Markets, the Future of Civilization, and Polymathy—#66 § Opposition to DL ”, Hanson & Hsu 2024

Robin Hanson: Prediction Markets, the Future of Civilization, and Polymathy—#66 § Opposition to DL

“Memorization in Machine Learning: A Survey of Results ”, Usynin et al 2024

Memorization in Machine Learning: A Survey of Results

“Simultaneous Linear Connectivity of Neural Networks modulo Permutation ”, Sharma et al 2024

Simultaneous linear connectivity of neural networks modulo permutation

“The Boundary of Neural Network Trainability Is Fractal ”, Sohl-Dickstein 2024

The boundary of neural network trainability is fractal

“Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility ”, Weissburg et al 2024

Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility

“Outliers With Opposing Signals Have an Outsized Effect on Neural Network Optimization ”, Rosenfeld & Risteski 2023

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

“Proving Linear Mode Connectivity of Neural Networks via Optimal Transport ”, Ferbach et al 2023

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

“How Deep Is the Brain? The Shallow Brain Hypothesis ”, Suzuki et al 2023

How deep is the brain? The shallow brain hypothesis

“Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture ”, Fu et al 2023

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

“Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition ”, Chen et al 2023

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

“Efficient Video and Audio Processing With Loihi 2 ”, Shrestha et al 2023

Efficient Video and Audio processing with Loihi 2

“Latent State Models of Training Dynamics ”, Hu et al 2023

Latent State Models of Training Dynamics

“Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity ”, Zhou et al 2023

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

“Combining Human Expertise With Artificial Intelligence: Experimental Evidence from Radiology ”, Agarwal et al 2023

Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology

“The Architecture of a Biologically Plausible Language Organ ”, Mitropolsky & Papadimitriou 2023

The Architecture of a Biologically Plausible Language Organ

“Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training ”, Zhang et al 2023

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

“Neural Oscillators Are Universal ”, Lanthaler et al 2023

Neural Oscillators are Universal

“Protecting Society from AI Misuse: When Are Restrictions on Capabilities Warranted? ”, Anderljung & Hazell 2023

Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?

“Symbolic Discovery of Optimization Algorithms ”, Chen et al 2023

Symbolic Discovery of Optimization Algorithms

“The Forward-Forward Algorithm: Some Preliminary Investigations ”, Hinton 2022

The Forward-Forward Algorithm: Some Preliminary Investigations

“Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability ”, Damian et al 2022

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

“Do Current Multi-Task Optimization Methods in Deep Learning Even Help? ”, Xin et al 2022

Do Current Multi-Task Optimization Methods in Deep Learning Even Help?

“Selective Neutralization and Deterring of Cockroaches With Laser Automated by Machine Vision ”, Rakhmatulin et al 2022

Selective neutralization and deterring of cockroaches with laser automated by machine vision

“Git Re-Basin: Merging Models modulo Permutation Symmetries ”, Ainsworth et al 2022

Git Re-Basin: Merging Models modulo Permutation Symmetries

“Learning With Differentiable Algorithms ”, Petersen 2022

Learning with Differentiable Algorithms

“Normalized Activation Function: Toward Better Convergence ”, Peiwen & Changsheng 2022

Normalized Activation Function: Toward Better Convergence

“Bugs in the Data: How ImageNet Misrepresents Biodiversity ”, Luccioni & Rolnick 2022

Bugs in the Data: How ImageNet Misrepresents Biodiversity

“The Value of Out-Of-Distribution Data ”, Silva et al 2022

The Value of Out-of-Distribution Data

“AniWho: A Quick and Accurate Way to Classify Anime Character Faces in Images ”, Naftali et al 2022

AniWho: A Quick and Accurate Way to Classify Anime Character Faces in Images

“Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training ”, You et al 2022

Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

“Adaptive Gradient Methods at the Edge of Stability ”, Cohen et al 2022

Adaptive Gradient Methods at the Edge of Stability

“Learning With Combinatorial Optimization Layers: a Probabilistic Approach ”, Dalle et al 2022

Learning with Combinatorial Optimization Layers: a Probabilistic Approach

“What Do We Maximize in Self-Supervised Learning? ”, Shwartz-Ziv et al 2022

What Do We Maximize in Self-Supervised Learning?

“Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit ”, Barak et al 2022

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

“High-Performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality ”, Elmoznino & Bonner 2022

High-performing neural network models of visual cortex benefit from high latent dimensionality

“Perceptein: A Synthetic Protein-Level Neural Network in Mammalian Cells ”, Chen et al 2022

Perceptein: A synthetic protein-level neural network in mammalian cells

“Predicting Word Learning in Children from the Performance of Computer Vision Systems ”, Rane et al 2022

Predicting Word Learning in Children from the Performance of Computer Vision Systems

“Wav2Vec-Aug: Improved Self-Supervised Training With Limited Data ”, Sriram et al 2022

Wav2Vec-Aug: Improved self-supervised training with limited data

“The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon ”, Thilak et al 2022

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

“An Improved One Millisecond Mobile Backbone ”, Vasu et al 2022

An Improved One millisecond Mobile Backbone

“Greedy Bayesian Posterior Approximation With Deep Ensembles ”, Tiulpin & Blaschko 2022

Greedy Bayesian Posterior Approximation with Deep Ensembles

“Generating Scientific Claims for Zero-Shot Scientific Fact Checking ”, Wright et al 2022

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

“On the Generalization Mystery in Deep Learning ”, Chatterjee & Zielinski 2022

On the Generalization Mystery in Deep Learning

“Deep Lexical Hypothesis: Identifying Personality Structure in Natural Language ”, Cutler & Condon 2022

Deep Lexical Hypothesis: Identifying personality structure in natural language

“Gradients without Backpropagation ”, Baydin et al 2022

Gradients without Backpropagation

“Towards Scaling Difference Target Propagation by Learning Backprop Targets ”, Ernoult et al 2022

Towards Scaling Difference Target Propagation by Learning Backprop Targets

“M5 Accuracy Competition: Results, Findings, and Conclusions ”, Makridakis et al 2022

M5 accuracy competition: Results, findings, and conclusions

“Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models ”, Kim et al 2022

Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

“Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow ”, Tambon et al 2021

Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

“Artificial Intelligence ‘Sees’ Split Electrons ”, Perdew 2021

Artificial intelligence ‘sees’ split electrons

“Pushing the Frontiers of Density Functionals by Solving the Fractional Electron Problem ”, Kirkpatrick et al 2021

Pushing the frontiers of density functionals by solving the fractional electron problem

Word Golf, Xia 2021

Word Golf

“Deep Learning Enables Genetic Analysis of the Human Thoracic Aorta ”, Pirruccello et al 2021

Deep learning enables genetic analysis of the human thoracic aorta

“Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks ”, Ericsson et al 2021

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

“Achieving Human Parity on Visual Question Answering ”, Yan et al 2021

Achieving Human Parity on Visual Question Answering

“BC-Z: Zero-Shot Task Generalization With Robotic Imitation Learning ”, Jang et al 2021

BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

“Learning in High Dimension Always Amounts to Extrapolation ”, Balestriero et al 2021

Learning in High Dimension Always Amounts to Extrapolation

“The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks ”, Entezari et al 2021

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

“The Structure of Genotype-Phenotype Maps Makes Fitness Landscapes Navigable ”, Greenbury et al 2021

The structure of genotype-phenotype maps makes fitness landscapes navigable

“Deep Neural Networks and Tabular Data: A Survey ”, Borisov et al 2021

Deep Neural Networks and Tabular Data: A Survey

“Learning through Atypical "Phase Transitions" in Overparameterized Neural Networks ”, Baldassi et al 2021

Learning through atypical "phase transitions" in overparameterized neural networks

“RAFT: A Real-World Few-Shot Text Classification Benchmark ”, Alex et al 2021

RAFT: A Real-World Few-Shot Text Classification Benchmark

“PPT: Pre-Trained Prompt Tuning for Few-Shot Learning ”, Gu et al 2021

PPT: Pre-trained Prompt Tuning for Few-shot Learning

“DART: Differentiable Prompt Makes Pre-Trained Language Models Better Few-Shot Learners ”, Zhang et al 2021

DART: Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

“ETA Prediction With Graph Neural Networks in Google Maps ”, Derrow-Pinion et al 2021

ETA Prediction with Graph Neural Networks in Google Maps

“Neural Operator: Learning Maps Between Function Spaces ”, Kovachki et al 2021

Neural Operator: Learning Maps Between Function Spaces

“Introducing Triton: Open-Source GPU Programming for Neural Networks ”, Tillet 2021

Introducing Triton: Open-source GPU programming for neural networks

“Predictive Coding: a Theoretical and Experimental Review ”, Millidge et al 2021

Predictive Coding: a Theoretical and Experimental Review

“A Connectivity-Constrained Computational Account of Topographic Organization in Primate High-Level Visual Cortex ”, Blauch et al 2021

A connectivity-constrained computational account of topographic organization in primate high-level visual cortex

“A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers ”, Miao et al 2021

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

“Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation ”, James et al 2021

Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

“Randomness In Neural Network Training: Characterizing The Impact of Tooling ”, Zhuang et al 2021

Randomness In Neural Network Training: Characterizing The Impact of Tooling

“Revisiting Deep Learning Models for Tabular Data ”, Gorishniy et al 2021

Revisiting Deep Learning Models for Tabular Data

“BEiT: BERT Pre-Training of Image Transformers ”, Bao et al 2021

BEiT: BERT Pre-Training of Image Transformers

“Revisiting Model Stitching to Compare Neural Representations ”, Bansal et al 2021

Revisiting Model Stitching to Compare Neural Representations

“Artificial Intelligence in China’s Revolution in Military Affairs ”, Kania 2021

Artificial intelligence in China’s revolution in military affairs

“The Geometry of Concept Learning ”, Sorscher et al 2021

The Geometry of Concept Learning

“VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning ”, Bardes et al 2021

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

“The Modern Mathematics of Deep Learning ”, Berner et al 2021

The Modern Mathematics of Deep Learning

“Understanding by Understanding Not: Modeling Negation in Language Models ”, Hosseini et al 2021

Understanding by Understanding Not: Modeling Negation in Language Models

“Entailment As Few-Shot Learner ”, Wang et al 2021

Entailment as Few-Shot Learner

“PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples ”, Assran et al 2021

PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

“Epistemic Autonomy: Self-Supervised Learning in the Mammalian Hippocampus ”, Santos-Pata et al 2021

Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus

“Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization ”, Xie et al 2021

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

“Contrasting Contrastive Self-Supervised Representation Learning Models ”, Kotar et al 2021

Contrasting Contrastive Self-Supervised Representation Learning Models

“Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations ”, Ryali et al 2021

Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations

“GWAS in Almost 195,000 Individuals Identifies 50 Previously Unidentified Genetic Loci for Eye Color ”, Simcoe et al 2021

GWAS in almost 195,000 individuals identifies 50 previously unidentified genetic loci for eye color

“BERTese: Learning to Speak to BERT ”, Haviv et al 2021

BERTese: Learning to Speak to BERT

“Predictive Coding Can Do Exact Backpropagation on Any Neural Network ”, Salvatori et al 2021

Predictive Coding Can Do Exact Backpropagation on Any Neural Network

“Barlow Twins: Self-Supervised Learning via Redundancy Reduction ”, Zbontar et al 2021

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

“WIT: Wikipedia-Based Image Text Dataset for Multimodal Multilingual Machine Learning ”, Srinivasan et al 2021

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

“The Inverse Variance–flatness Relation in Stochastic Gradient Descent Is Critical for Finding Flat Minima ”, Feng & Tu 2021

The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima

“Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability ”, Cohen et al 2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

“Rip Van Winkle’s Razor: A Simple Estimate of Overfit to Test Data ”, Arora & Zhang 2021

Rip van Winkle’s Razor: A Simple Estimate of Overfit to Test Data

“Image Completion via Inference in Deep Generative Models ”, Harvey et al 2021

Image Completion via Inference in Deep Generative Models

“Contrastive Learning Inverts the Data Generating Process ”, Zimmermann et al 2021

Contrastive Learning Inverts the Data Generating Process

“DirectPred: Understanding Self-Supervised Learning Dynamics without Contrastive Pairs ”, Tian et al 2021

DirectPred: Understanding self-supervised Learning Dynamics without Contrastive Pairs

“MLGO: a Machine Learning Guided Compiler Optimizations Framework ”, Trofin et al 2021

MLGO: a Machine Learning Guided Compiler Optimizations Framework

“Facial Recognition Technology Can Expose Political Orientation from Naturalistic Facial Images ”, Kosinski 2021

Facial recognition technology can expose political orientation from naturalistic facial images

“Solving Mixed Integer Programs Using Neural Networks ”, Nair et al 2020

Solving Mixed Integer Programs Using Neural Networks

“Sixteen Facial Expressions Occur in Similar Contexts Worldwide ”, Cowen 2020

Sixteen facial expressions occur in similar contexts worldwide

“PiRank: Learning To Rank via Differentiable Sorting ”, Swezey et al 2020

PiRank: Learning To Rank via Differentiable Sorting

“Real-Time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity ”, Angrick et al 2020

Real-time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity

“Generalization Bounds for Deep Learning ”, Valle-Pérez & Louis 2020

Generalization bounds for deep learning

“Selective Eye-Gaze Augmentation To Enhance Imitation Learning In Atari Games ”, Thammineni et al 2020

Selective Eye-gaze Augmentation To Enhance Imitation Learning In Atari Games

“SimSiam: Exploring Simple Siamese Representation Learning ”, Chen & He 2020

SimSiam: Exploring Simple Siamese Representation Learning

“Recent Advances in Neurotechnologies With Broad Potential for Neuroscience Research ”, Vázquez-Guardado et al 2020

Recent advances in neurotechnologies with broad potential for neuroscience research

“Voting for Authorship Attribution Applied to Dark Web Data ”, Samreen & Alalfi 2020

Voting for Authorship Attribution Applied to Dark Web Data

“Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too ”, Hernández-Orallo 2020

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

“Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding ”, Roberts et al 2020

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

“Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary With Width and Depth ”, Nguyen et al 2020

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

“Guys and Dolls ”, Devlin & Locatelli 2020

Guys and Dolls

“Open-Domain Question Answering Goes Conversational via Question Rewriting ”, Anantha et al 2020

Open-Domain Question Answering Goes Conversational via Question Rewriting

“Digital Voicing of Silent Speech ”, Gaddy & Klein 2020

Digital Voicing of Silent Speech

“Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment ”, Talebi et al 2020

Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment

“Implicit Gradient Regularization ”, Barrett & Dherin 2020

Implicit Gradient Regularization

“Learning Explanations That Are Hard to Vary ”, Parascandolo et al 2020

Learning explanations that are hard to vary

“Large Associative Memory Problem in Neurobiology and Machine Learning ”, Krotov & Hopfield 2020

Large Associative Memory Problem in Neurobiology and Machine Learning

“AdapterHub: A Framework for Adapting Transformers ”, Pfeiffer et al 2020

AdapterHub: A Framework for Adapting Transformers

“Identifying Regulatory Elements via Deep Learning ”, Barshai et al 2020

Identifying Regulatory Elements via Deep Learning

“Is SGD a Bayesian Sampler? Well, Almost ”, Mingard et al 2020

Is SGD a Bayesian sampler? Well, almost

“Bootstrap Your Own Latent (BYOL): A New Approach to Self-Supervised Learning ”, Grill et al 2020

Bootstrap your own latent (BYOL): A new approach to self-supervised Learning

“SCAN: Learning to Classify Images without Labels ”, Gansbeke et al 2020

SCAN: Learning to Classify Images without Labels

“Politeness Transfer: A Tag and Generate Approach ”, Madaan et al 2020

Politeness Transfer: A Tag and Generate Approach

“Supervised Contrastive Learning ”, Khosla et al 2020

Supervised Contrastive Learning

“Backpropagation and the Brain ”, Lillicrap et al 2020

Backpropagation and the brain

“Can You Put It All Together: Evaluating Conversational Agents’ Ability to Blend Skills ”, Smith et al 2020

Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills

“Topology of Deep Neural Networks ”, Naitzat et al 2020

Topology of deep neural networks

“Improved Baselines With Momentum Contrastive Learning ”, Chen et al 2020

Improved Baselines with Momentum Contrastive Learning

“The Large Learning Rate Phase of Deep Learning: the Catapult Mechanism ”, Lewkowycz et al 2020

The large learning rate phase of deep learning: the catapult mechanism

“Fast Differentiable Sorting and Ranking ”, Blondel et al 2020

Fast Differentiable Sorting and Ranking

“The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence ”, Marcus 2020

The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence

“Quantifying Independently Reproducible Machine Learning ”, Raff 2020

Quantifying Independently Reproducible Machine Learning

“The Secret History of Facial Recognition: Sixty Years Ago, a Sharecropper’s Son Invented a Technology to Identify Faces. Then the Record of His Role All but Vanished. Who Was Woody Bledsoe, and Who Was He Working For? ”, Raviv 2020

The Secret History of Facial Recognition: Sixty years ago, a sharecropper’s son invented a technology to identify faces. Then the record of his role all but vanished. Who was Woody Bledsoe, and who was he working for?

“Can the Brain Do Backpropagation? -Exact Implementation of Backpropagation in Predictive Coding Networks ”, Song et al 2020

Can the Brain Do Backpropagation? -Exact Implementation of Backpropagation in Predictive Coding Networks

“Learning Neural Activations ”, Minhas & Asif 2019

Learning Neural Activations

“2019 AI Alignment Literature Review and Charity Comparison ”, Larks 2019

2019 AI Alignment Literature Review and Charity Comparison

“Libri-Light: A Benchmark for ASR With Limited or No Supervision ”, Kahn et al 2019

Libri-Light: A Benchmark for ASR with Limited or No Supervision

“Connecting Vision and Language With Localized Narratives ”, Pont-Tuset et al 2019

Connecting Vision and Language with Localized Narratives

“12-In-1: Multi-Task Vision and Language Representation Learning ”, Lu et al 2019

12-in-1: Multi-Task Vision and Language Representation Learning

“A Deep Learning Framework for Neuroscience ”, Richards et al 2019

A deep learning framework for neuroscience :

View PDF:

/doc/ai/nn/2019-richards.pdf

“Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules ”, Sanchez-Lengeling et al 2019

Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules

“KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition With Deep Learning ”, Clanuwat et al 2019

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning

“Approximate Inference in Discrete Distributions With Monte Carlo Tree Search and Value Functions ”, Buesing et al 2019

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

“Best Practices for the Human Evaluation of Automatically Generated Text ”, Lee et al 2019

Best practices for the human evaluation of automatically generated text

“RandAugment: Practical Automated Data Augmentation With a Reduced Search Space ”, Cubuk et al 2019

RandAugment: Practical automated data augmentation with a reduced search space

“Large-Scale Pretraining for Neural Machine Translation With Tens of Billions of Sentence Pairs ”, Meng et al 2019

Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

“ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations ”, Lan et al 2019

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

“Engineering a Less Artificial Intelligence ”, Sinz et al 2019

Engineering a Less Artificial Intelligence :

View PDF:

/doc/ai/nn/2019-sinz.pdf

“Neural Networks Are a Priori Biased towards Boolean Functions With Low Entropy ”, Mingard et al 2019

Neural networks are a priori biased towards Boolean functions with low entropy

“Simple, Scalable Adaptation for Neural Machine Translation ”, Bapna et al 2019

Simple, Scalable Adaptation for Neural Machine Translation

“Emergent Tool Use From Multi-Agent Autocurricula ”, Baker et al 2019

Emergent Tool Use From Multi-Agent Autocurricula

“A Step Toward Quantifying Independently Reproducible Machine Learning Research ”, Raff 2019

A Step Toward Quantifying Independently Reproducible Machine Learning Research

“Does Machine Translation Affect International Trade? Evidence from a Large Digital Platform ”, Brynjolfsson et al 2019b

Does Machine Translation Affect International Trade? Evidence from a Large Digital Platform

“Can One Concurrently Record Electrical Spikes from Every Neuron in a Mammalian Brain? ”, Kleinfeld et al 2019

Can One Concurrently Record Electrical Spikes from Every Neuron in a Mammalian Brain?

“Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges ”, Arivazhagan et al 2019

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

“Deep Set Prediction Networks ”, Zhang et al 2019

Deep Set Prediction Networks

“Optimizing Color for Camouflage and Visibility Using Deep Learning: the Effects of the Environment and the Observer’s Visual System ”, Fennell et al 2019

Optimizing color for camouflage and visibility using deep learning: the effects of the environment and the observer’s visual system

“Speech2Face: Learning the Face Behind a Voice ”, Oh et al 2019

Speech2Face: Learning the Face Behind a Voice

“Universal Approximation With Deep Narrow Networks ”, Kidger & Lyons 2019

Universal Approximation with Deep Narrow Networks

“SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems ”, Wang et al 2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

“Universal Quantum Control through Deep Reinforcement Learning ”, Niu et al 2019

Universal quantum control through deep reinforcement learning

“Analysing Mathematical Reasoning Abilities of Neural Models ”, Saxton et al 2019

Analysing Mathematical Reasoning Abilities of Neural Models

“Reinforcement Learning for Recommender Systems: A Case Study on Youtube ”, Chen 2019

Reinforcement Learning for Recommender Systems: A Case Study on Youtube

“Stochastic Optimization of Sorting Networks via Continuous Relaxations ”, Grover et al 2019

Stochastic Optimization of Sorting Networks via Continuous Relaxations

“Surprises in High-Dimensional Ridgeless Least Squares Interpolation ”, Hastie et al 2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

“DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs ”, Dua et al 2019

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

“Theories of Error Back-Propagation in the Brain ”, Whittington & Bogacz 2019

Theories of Error Back-Propagation in the Brain

“A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images ”, Leuner 2019

A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images

“Unmasking Clever Hans Predictors and Assessing What Machines Really Learn ”, Lapuschkin et al 2019

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

“What Makes a Good Conversation? How Controllable Attributes Affect Human Judgments ”, See et al 2019

What makes a good conversation? How controllable attributes affect human judgments

“The Evolved Transformer ”, So et al 2019

The Evolved Transformer

“Forecasting Transformative AI: An Expert Survey ”, Gruetzemacher et al 2019

Forecasting Transformative AI: An Expert Survey

“Human Few-Shot Learning of Compositional Instructions ”, Lake et al 2019

Human few-shot learning of compositional instructions

“Evaluation and Accurate Diagnoses of Pediatric Diseases Using Artificial Intelligence ”, Liang et al 2019

Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence :

View PDF:

/doc/ai/nn/2019-liang.pdf

“Why Is There No Successful Whole Brain Simulation (Yet)? ”, Stiefel 2019

Why is There No Successful Whole Brain Simulation (Yet)? :

View PDF:

/doc/ai/nn/2019-stiefel.pdf

“High-Performance Medicine: the Convergence of Human and Artificial Intelligence ”, Topol 2019

High-performance medicine: the convergence of human and artificial intelligence :

View PDF:

/doc/ai/nn/2019-topol.pdf

“Identifying Facial Phenotypes of Genetic Disorders Using Deep Learning ”, Gurovich et al 2019

Identifying facial phenotypes of genetic disorders using deep learning :

View PDF:

/doc/genetics/heritable/2019-gurovich.pdf

“Reinventing the Wheel: Discovering the Optimal Rolling Shape With PyTorch ”, Wiener 2019

Reinventing the Wheel: Discovering the Optimal Rolling Shape with PyTorch

“An Empirical Study of Example Forgetting during Deep Neural Network Learning ”, Toneva et al 2018

An Empirical Study of Example Forgetting during Deep Neural Network Learning

“CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge ”, Talmor et al 2018

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

“Depth With Nonlinearity Creates No Bad Local Minima in ResNets ”, Kawaguchi & Bengio 2018

Depth with Nonlinearity Creates No Bad Local Minima in ResNets

“BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding ”, Devlin et al 2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

“Interpretable Textual Neuron Representations for NLP ”, Poerner et al 2018

Interpretable Textual Neuron Representations for NLP

“Searching for Efficient Multi-Scale Architectures for Dense Image Prediction ”, Chen et al 2018

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

“Machine Learning to Predict Osteoporotic Fracture Risk from Genotypes ”, Forgetta et al 2018

Machine Learning to Predict Osteoporotic Fracture Risk from Genotypes

“Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction ”, Hashimoto & Tsuruoka 2018

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

“Searching Toward Pareto-Optimal Device-Aware Neural Architectures ”, Cheng et al 2018

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

“A Study of Reinforcement Learning for Neural Machine Translation ”, Wu et al 2018

A Study of Reinforcement Learning for Neural Machine Translation

“Modeling Visual Context Is Key to Augmenting Object Detection Datasets ”, Dvornik et al 2018

Modeling Visual Context is Key to Augmenting Object Detection Datasets

“Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search ”, Zela et al 2018

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

“Automatically Composing Representation Transformations As a Means for Generalization ”, Chang et al 2018

Automatically Composing Representation Transformations as a Means for Generalization

“Differentiable Learning-To-Normalize via Switchable Normalization ”, Luo et al 2018

Differentiable Learning-to-Normalize via Switchable Normalization

“On the Spectral Bias of Neural Networks ”, Rahaman et al 2018

On the Spectral Bias of Neural Networks

“Neural Tangent Kernel: Convergence and Generalization in Neural Networks ”, Jacot et al 2018

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

“Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning ”, Pang et al 2018

Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning

“Do CIFAR-10 Classifiers Generalize to CIFAR-10? ”, Recht et al 2018

Do CIFAR-10 Classifiers Generalize to CIFAR-10?

“Zero-Shot Dual Machine Translation ”, Sestorain et al 2018

Zero-Shot Dual Machine Translation

“Do Better ImageNet Models Transfer Better? ”, Kornblith et al 2018

Do Better ImageNet Models Transfer Better?

“GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding ”, Wang et al 2018

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

“Adafactor: Adaptive Learning Rates With Sublinear Memory Cost ”, Shazeer & Stern 2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

“Averaging Weights Leads to Wider Optima and Better Generalization ”, Izmailov et al 2018

Averaging Weights Leads to Wider Optima and Better Generalization

“SentEval: An Evaluation Toolkit for Universal Sentence Representations ”, Conneau & Kiela 2018

SentEval: An Evaluation Toolkit for Universal Sentence Representations

“Think You Have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge ”, Clark et al 2018

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

“Analyzing Uncertainty in Neural Machine Translation ”, Ott et al 2018

Analyzing Uncertainty in Neural Machine Translation

“End-To-End Deep Image Reconstruction from Human Brain Activity ”, Shen et al 2018

End-to-end deep image reconstruction from human brain activity

“Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari ”, Chrabaszcz et al 2018

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

“SignSGD: Compressed Optimization for Non-Convex Problems ”, Bernstein et al 2018

signSGD: Compressed Optimization for Non-Convex Problems

“Differentiable Dynamic Programming for Structured Prediction and Attention ”, Mensch & Blondel 2018

Differentiable Dynamic Programming for Structured Prediction and Attention

“UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction ”, Lel et al 2018

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

“Semantic Projection: Recovering Human Knowledge of Multiple, Distinct Object Features from Word Embeddings ”, Grand et al 2018

Semantic projection: recovering human knowledge of multiple, distinct object features from word embeddings

“Panoptic Segmentation ”, Kirillov et al 2018

Panoptic Segmentation

“Clinically Applicable Deep Learning for Diagnosis and Referral in Retinal Disease ”, Fauw et al 2018

Clinically applicable deep learning for diagnosis and referral in retinal disease :

View PDF:

/doc/ai/nn/2018-defauw.pdf

“Prediction of Cardiovascular Risk Factors from Retinal Fundus Photographs via Deep Learning ”, Poplin et al 2018

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

“Three-Dimensional Visualization and a Deep-Learning Model Reveal Complex Fungal Parasite Networks in Behaviorally Manipulated Ants ”, Fredericksen et al 2017

Three-dimensional visualization and a deep-learning model reveal complex fungal parasite networks in behaviorally manipulated ants

“Decoupled Weight Decay Regularization ”, Loshchilov & Hutter 2017

Decoupled Weight Decay Regularization

“Automatic Differentiation in PyTorch ”, Paszke et al 2017

Automatic differentiation in PyTorch

“Rethinking Generalization Requires Revisiting Old Ideas: Statistical Mechanics Approaches and Complex Learning Behavior ”, Martin & Mahoney 2017

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

“Mixup: Beyond Empirical Risk Minimization ”, Zhang et al 2017

mixup: Beyond Empirical Risk Minimization

“Malware Detection by Eating a Whole EXE ”, Raff et al 2017

Malware Detection by Eating a Whole EXE

“AlphaGo Zero: Mastering the Game of Go without Human Knowledge ”, Silver et al 2017

AlphaGo Zero: Mastering the game of Go without human knowledge

“Swish: Searching for Activation Functions ”, Ramachandran et al 2017

Swish: Searching for Activation Functions

“Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates ”, Smith & Topin 2017

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

“Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection ”, Dwibedi et al 2017

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

“Emergence of Locomotion Behaviors in Rich Environments ”, Heess et al 2017

Emergence of Locomotion behaviors in Rich Environments

“The Persistence and Transience of Memory ”, Richards & Frankland 2017

The Persistence and Transience of Memory

“Verb Physics: Relative Physical Knowledge of Actions and Objects ”, Forbes & Choi 2017

Verb Physics: Relative Physical Knowledge of Actions and Objects

“Driver Identification Using Automobile Sensor Data from a Single Turn ”, Hallac et al 2017

Driver Identification Using Automobile Sensor Data from a Single Turn

“StreetStyle: Exploring World-Wide Clothing Styles from Millions of Photos ”, Matzen et al 2017

StreetStyle: Exploring world-wide clothing styles from millions of photos

“Deep Voice 2: Multi-Speaker Neural Text-To-Speech ”, Arik et al 2017

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

“WebVision Challenge: Visual Learning and Understanding With Web Data ”, Li et al 2017

WebVision Challenge: Visual Learning and Understanding With Web Data

“Inferring and Executing Programs for Visual Reasoning ”, Johnson et al 2017

Inferring and Executing Programs for Visual Reasoning

“Visual Attribute Transfer through Deep Image Analogy ”, Liao et al 2017

Visual Attribute Transfer through Deep Image Analogy

“On Weight Initialization in Deep Neural Networks ”, Kumar 2017

On weight initialization in deep neural networks

“A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference ”, Williams et al 2017

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

“RACE: Large-Scale ReAding Comprehension Dataset From Examinations ”, Lai et al 2017

RACE: Large-scale ReAding Comprehension Dataset From Examinations

“Data-Efficient Deep Reinforcement Learning for Dexterous Manipulation ”, Popov et al 2017

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

“Prototypical Networks for Few-Shot Learning ”, Snell et al 2017

Prototypical Networks for Few-shot Learning

“Meta Networks ”, Munkhdalai & Yu 2017

Meta Networks

“Understanding Synthetic Gradients and Decoupled Neural Interfaces ”, Czarnecki et al 2017

Understanding Synthetic Gradients and Decoupled Neural Interfaces

“Adaptive Neural Networks for Efficient Inference ”, Bolukbasi et al 2017

Adaptive Neural Networks for Efficient Inference

“Deep Voice: Real-Time Neural Text-To-Speech ”, Arik et al 2017

Deep Voice: Real-time Neural Text-to-Speech

“Machine Learning Predicts Laboratory Earthquakes ”, Bertr et al 2017

Machine Learning Predicts Laboratory Earthquakes

“Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks ”, Katz et al 2017

Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks

“Dermatologist-Level Classification of Skin Cancer With Deep Neural Networks ”, Esteva et al 2017

Dermatologist-level classification of skin cancer with deep neural networks :

View PDF:

/doc/ai/nn/2017-esteva.pdf

“Child Machines ”, Proudfoot 2017

Child machines

“Machine Learning for Systems and Systems for Machine Learning ”, Dean 2017

Machine Learning for Systems and Systems for Machine Learning

“Feedback Networks ”, Zamir et al 2016

Feedback Networks

“CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning ”, Johnson et al 2016

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

“Towards Information-Seeking Agents ”, Bachman et al 2016

Towards Information-Seeking Agents

“Spatially Adaptive Computation Time for Residual Networks ”, Figurnov et al 2016

Spatially Adaptive Computation Time for Residual Networks

“Deep Learning Reinvents the Hearing Aid: Finally, Wearers of Hearing Aids Can Pick out a Voice in a Crowded Room ”, Wang 2016b

Deep Learning Reinvents the Hearing Aid: Finally, wearers of hearing aids can pick out a voice in a crowded room

“MS MARCO: A Human Generated MAchine Reading COmprehension Dataset ”, Bajaj et al 2016

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

“Learning to Reinforcement Learn ”, Wang et al 2016

Learning to reinforcement learn

“Lip Reading Sentences in the Wild ”, Chung et al 2016

Lip Reading Sentences in the Wild

“Could a Neuroscientist Understand a Microprocessor? ”, Jonas & Kording 2016

Could a Neuroscientist Understand a Microprocessor?

“A Neural Network Playground ”, Smilkov & Carter 2016

A Neural Network Playground

“Homotopy Analysis for Tensor PCA ”, Anandkumar et al 2016

Homotopy Analysis for Tensor PCA

“Why Does Deep and Cheap Learning Work so Well? ”, Lin et al 2016

Why does deep and cheap learning work so well?

“SGDR: Stochastic Gradient Descent With Warm Restarts ”, Loshchilov & Hutter 2016

SGDR: Stochastic Gradient Descent with Warm Restarts

“Concrete Problems in AI Safety ”, Amodei et al 2016

Concrete Problems in AI Safety

“SQuAD: 100,000+ Questions for Machine Comprehension of Text ”, Rajpurkar et al 2016

SQuAD: 100,000+ Questions for Machine Comprehension of Text

“Matching Networks for One Shot Learning ”, Vinyals et al 2016

Matching Networks for One Shot Learning

“Convolutional Sketch Inversion ”, Güçlütürk et al 2016

Convolutional Sketch Inversion

“Unifying Count-Based Exploration and Intrinsic Motivation ”, Bellemare et al 2016

Unifying Count-Based Exploration and Intrinsic Motivation

“Synthesizing the Preferred Inputs for Neurons in Neural Networks via Deep Generator Networks ”, Nguyen et al 2016

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

“Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity ”, Daniely et al 2016

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

“"Why Should I Trust You?": Explaining the Predictions of Any Classifier ”, Ribeiro et al 2016

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

“Mastering the Game of Go With Deep Neural Networks and Tree Search ”, Silver et al 2016

Mastering the game of Go with deep neural networks and tree search

“Learning to Compose Neural Networks for Question Answering ”, Andreas et al 2016

Learning to Compose Neural Networks for Question Answering

“How a Japanese Cucumber Farmer Is Using Deep Learning and TensorFlow ”, Sato 2016

How a Japanese cucumber farmer is using deep learning and TensorFlow :

View HTML:

/doc/www/cloud.google.com/3bfbe97a13cecba72a03ec7fd40a1a9cf40f7dd4.html

“Random Gradient-Free Minimization of Convex Functions ”, Nesterov & Spokoiny 2015

Random Gradient-Free Minimization of Convex Functions

“Data-Dependent Initializations of Convolutional Neural Networks ”, Krähenbühl et al 2015

Data-dependent Initializations of Convolutional Neural Networks

“Online Batch Selection for Faster Training of Neural Networks ”, Loshchilov & Hutter 2015

Online Batch Selection for Faster Training of Neural Networks

“Neural Module Networks ”, Andreas et al 2015

Neural Module Networks

“Deep DPG (DDPG): Continuous Control With Deep Reinforcement Learning ”, Lillicrap et al 2015

Deep DPG (DDPG): Continuous control with deep reinforcement learning

“A Neural Algorithm of Artistic Style ”, Gatys et al 2015

A Neural Algorithm of Artistic Style

“VQA: Visual Question Answering ”, Agrawal et al 2015

VQA: Visual Question Answering

“Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks ”, Weston et al 2015

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

“Probabilistic Line Searches for Stochastic Optimization ”, Mahsereci & Hennig 2015

Probabilistic Line Searches for Stochastic Optimization

“Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ”, He et al 2015

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Neural Networks and Deep Learning, Nielsen 2015

Neural networks and deep learning

“Neural Networks and Deep Learning § Ch6 Deep Learning ”, Nielsen 2015

Neural networks and deep learning § ch6 Deep Learning

“Qualitatively Characterizing Neural Network Optimization Problems ”, Goodfellow et al 2014

Qualitatively characterizing neural network optimization problems

“Freeze-Thaw Bayesian Optimization ”, Swersky et al 2014

Freeze-Thaw Bayesian Optimization

“Microsoft COCO: Common Objects in Context ”, Lin et al 2014

Microsoft COCO: Common Objects in Context

“Deep Learning in Neural Networks: An Overview ”, Schmidhuber 2014

Deep Learning in Neural Networks: An Overview

“Neural Networks, Manifolds, and Topology ”, Olah 2014

Neural Networks, Manifolds, and Topology

“Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks ”, Saxe et al 2013

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

“Distributed Representations of Words and Phrases and Their Compositionality ”, Mikolov et al 2013

Distributed Representations of Words and Phrases and their Compositionality

“Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science ”, Clark 2013

Whatever next? Predictive brains, situated agents, and the future of cognitive science

“Deep Gaussian Processes ”, Damianou & Lawrence 2012

Deep Gaussian Processes

“Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting ”, Xie et al 2012

Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting

“HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent ”, Niu et al 2011

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

“Large-Scale Deep Unsupervised Learning Using Graphics Processors ”, Raina et al 2009

Large-scale deep unsupervised learning using graphics processors

“A Free Energy Principle for the Brain ”, Friston et al 2006

A free energy principle for the brain

“Understanding the Nature of the General Factor of Intelligence: The Role of Individual Differences in Neural Plasticity As an Explanatory Mechanism ”, Garlick 2002

Understanding the nature of the general factor of intelligence: The role of individual differences in neural plasticity as an explanatory mechanism

“Starfish § Bulrushes ”, Watts 1999

Starfish § Bulrushes

“Exponentiated Gradient versus Gradient Descent for Linear Predictors ”, Kivinen & Warmuth 1997

Exponentiated Gradient versus Gradient Descent for Linear Predictors

“Optimality in Biological and Artificial Networks? ”, Levine & Elsberry 1997

Optimality in Biological and Artificial Networks? :

View PDF:

/doc/ai/nn/1997-levine-optialityinbiologicalandartificialnetworks.pdf

“A Sociological Study of the Official History of the Perceptrons Controversy ”, Olazaran 1996

A Sociological Study of the Official History of the Perceptrons Controversy

“Turing Patterns in CNNs, I: Once over Lightly ”, Goras et al 1995

Turing patterns in CNNs, I: Once over lightly

“Learning and Generalization in a Two-Layer Neural Network: The Role of the Vapnik-Chervonvenkis Dimension ”, Opper 1994

Learning and generalization in a two-layer neural network: The role of the Vapnik-Chervonvenkis dimension

“A Sociological Study of the Official History of the Perceptrons Controversy [1993] ”, Olazaran 1993

A Sociological Study of the Official History of the Perceptrons Controversy [1993]

“The Statistical Mechanics of Learning a Rule ”, Watkin et al 1993

The statistical mechanics of learning a rule

“On Learning the Past Tenses of English Verbs ”, Rumelhart & McClelland 1993

On Learning the Past Tenses of English Verbs

“Statistical Mechanics of Learning from Examples ”, Seung et al 1992

Statistical mechanics of learning from examples

“Memorization Without Generalization in a Multilayered Neural Network ”, Hansel et al 1992

Memorization Without Generalization in a Multilayered Neural Network

“Symbolic and Neural Learning Algorithms: An Experimental Comparison ”, Shavlik et al 1991

Symbolic and neural learning algorithms: An experimental comparison

“Backpropagation Learning For Multilayer Feed-Forward Neural Networks Using The Conjugate Gradient Method ”, Johansson et al 1991

Backpropagation Learning For Multilayer Feed-Forward Neural Networks Using The Conjugate Gradient Method

“Artificial Neural Networks, Back Propagation, and the Kelley-Bryson Gradient Procedure ”, Dreyfus 1990

Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure :

View PDF:

/doc/ai/nn/1990-dreyfus.pdf

“Exhaustive Learning ”, Schwartz et al 1990

Exhaustive Learning

“International Joint Conference on Neural Networks, January 15–19, 1990: Volume 1: Theory Track, Neural and Cognitive Sciences Track ”, Caudill 1990

International Joint Conference on Neural Networks, January 15–19, 1990: Volume 1: Theory Track, Neural and Cognitive Sciences Track :

View PDF (33MB):

/doc/ai/nn/1990-caudill-internationaljointconferenceonneuralnetworks1990-v1.pdf

“International Joint Conference on Neural Networks, January 15–19, 1990: Volume 2: Applications Track ”, Caudill 1990

International Joint Conference on Neural Networks, January 15–19, 1990: Volume 2: Applications Track :

View PDF (33MB):

/doc/ai/nn/1990-caudill-internationaljointconferenceonneuralnetworks1990-v2.pdf

“Explanatory Coherence ”, Thagard 1989

Explanatory coherence

“Parallel Distributed Processing: Implications for Cognition and Development ”, McClelland 1989

Parallel Distributed Processing: Implications for Cognition and Development

“Cellular Neural Networks: Theory ”, Chua & Yang 1988b

Cellular neural networks: theory

“Cellular Neural Networks: Applications ”, Chua & Yang 1988

Cellular neural networks: applications

“The Brain As Template ”, Finkbeiner 1988

The Brain as Template :

View PDF:

/doc/ai/nn/1988-finkbeiner.pdf

“Observation of Phase Transitions in Spreading Activation Networks ”, Shrager et al 1987

Observation of Phase Transitions in Spreading Activation Networks

“Learning Representations by Backpropagating Errors ”, Rumelhart et al 1986b

Learning representations by backpropagating errors

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Volume 1: Foundations, Rumelhart et al 1986

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Volume 1: Foundations :

View PDF (33MB):

/doc/ai/nn/1986-rumelhart-pdp-v1.pdf

“Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks ”, Amit et al 1985

Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks

“Learning-Logic: Casting the Cortex of the Human Brain in Silicon ”, Parker 1985

Learning-Logic: Casting the Cortex of the Human Brain in Silicon :

View PDF:

/doc/ai/nn/1985-parker.pdf

“Toward An Interactive Model Of Reading ”, Rumelhart 1985

Toward An Interactive Model Of Reading :

View PDF:

/doc/ai/nn/1985-rumelhart.pdf

“Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences ”, Werbos 1974

Beyond regression: new tools for prediction and analysis in the behavioral sciences

“Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms ”, Rosenblatt 1962

Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms :

View PDF (18MB):

/doc/ai/nn/1962-rosenblatt-principlesofneurodynamics.pdf

“Speculations on Perceptrons and Other Automata ”, Good 1959

Speculations on Perceptrons and Other Automata

“Pandemonium: A Paradigm for Learning ”, Selfridge 1959

Pandemonium: A Paradigm for Learning :

View PDF:

/doc/ai/nn/1959-selfridge.pdf

“Representation of Events in Nerve Nets and Finite Automata ”, Kleene 1951

Representation of Events in Nerve Nets and Finite Automata :

View PDF:

/doc/ai/nn/1951-kleene.pdf

“A Logical Calculus of the Ideas Immanent in Nervous Activity ”, McCulloch & Pitts 1943

A logical calculus of the ideas immanent in nervous activity

“Some AI Koans § Http://www.catb.org/esr/jargon/html/koans.html#id3141241 ”, Raymond 2025

Some AI Koans § http://www.catb.org/esr/jargon/html/koans.html#id3141241 :

View HTML:

/doc/www/www.catb.org/57d8adf8ecf7d6c89649ff9bb2c0bb8f07413e40.html#id3141241

“Some AI Koans ”, Raymond 2025

Some AI Koans

“The Age of Em, A Book ”, Hanson 2025

The Age of Em, A Book

“`gsutil Config`: Obtain Credentials and Create Configuration File ”, Google 2025

gsutil config: Obtain credentials and create configuration file

“Why Momentum Really Works ”

Why Momentum Really Works

“Differentiable Finite State Machines ”

Differentiable Finite State Machines :

View HTML:

/doc/www/google-research.github.io/0eeba2f81960bbe9a4de7644ea87beed8a3f7f31.html

“About Sam Greydanus ”, Greydanus 2025

About Sam Greydanus :

View External Link:

https://greydanus.github.io/about_me/

“Contrastive Representation Learning ”

Contrastive Representation Learning :

View HTML (16MB):

/doc/www/lilianweng.github.io/34370e160f56a3affd65dc9cd4313dcffd9205cc.html

“The Internet’s AI Slop Problem Is Only Going to Get Worse ”

The Internet’s AI Slop Problem Is Only Going to Get Worse :

View HTML:

https://nymag.com/intelligencer/article/ai-generated-content-internet-online-slop-spam.html

“Glow: Better Reversible Generative Models ”

Glow: Better Reversible Generative Models

“Preetum Nakkiran ”

Preetum Nakkiran

“Differentiable Programming from Scratch ”

Differentiable Programming from Scratch

“Deep Reinforcement Learning Doesn’t Work Yet ”

Deep Reinforcement Learning Doesn’t Work Yet

“[Commonsense Media Survey on US Generative Media Use] ”

[Commonsense Media survey on US generative media use] :

View HTML:

/doc/www/www.commonsensemedia.org/673dd7d16332c8c39f7b3ac35237c13f5d72f3de.html

“Gourmand Cat Fence ”

Gourmand Cat Fence

“Simple versus Short: Higher-Order Degeneracy and Error-Correction ”

Simple versus Short: Higher-order degeneracy and error-correction :

View HTML:

/doc/www/www.greaterwrong.com/8677e9ee445914700a8e8aeb235c5c6bf0468e95.html

“Inferring Neural Activity Before Plasticity As a Foundation for Learning beyond Backpropagation ”

Inferring neural activity before plasticity as a foundation for learning beyond backpropagation :

View HTML:

/doc/www/www.nature.com/678ca49a8827346f8839158338b7973cbf5a6430.html

“Reddit: Reinforcement Learning Subreddit ”, Reddit 2025

Reddit: Reinforcement Learning subreddit

“AI and the Indian Election ”, Schneier 2025

AI and the Indian Election

“Lip Reading Sentences in the Wild [Video] ”

Lip Reading Sentences in the Wild [video] :

https://www.youtube.com/watch?v=5aogzAUPilE

Wikipedia (12)

Backpropagation
Bayesian neural network :

https://en.wikipedia.org/wiki/Bayesian_neural_network
Bias-variance tradeoff :

https://en.wikipedia.org/wiki/Bias-variance_tradeoff
Cellular neural network
Circuit complexity
Distributional semantics
Information bottleneck method :

https://en.wikipedia.org/wiki/Information_bottleneck_method
Joseph M. Sussman :

https://en.wikipedia.org/wiki/Joseph_M._Sussman
Marvin Minsky
Moravec’s paradox
Softmax function
Warren S. McCulloch :

https://en.wikipedia.org/wiki/Warren_S._McCulloch

Miscellaneous

Bibliography

https://arxiv.org/abs/2310.12109: “Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture ”, Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

link-bibliography
https://www.nber.org/papers/w31422: “Combining Human Expertise With Artificial Intelligence: Experimental Evidence from Radiology ”, Nikhil Agarwal, Alex Moehring, Pranav Rajpurkar, Tobias Salz

link-bibliography
https://arxiv.org/abs/2302.06675#google: “Symbolic Discovery of Optimization Algorithms ”, Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, Quoc V. Le

link-bibliography
https://www.tandfonline.com/doi/full/10.1080/00305316.2022.2121777: “Selective Neutralization and Deterring of Cockroaches With Laser Automated by Machine Vision ”, Ildar Rakhmatulin, Mathieu Lihoreau, Jose Pueyo

link-bibliography
https://arxiv.org/abs/2208.11012: “AniWho: A Quick and Accurate Way to Classify Anime Character Faces in Images ”, Martinus Grady Naftali, Jason Sebastian Sulistyawan, Kelvin Julian, Felix Indra Kurniadi

link-bibliography
https://arxiv.org/abs/2201.13415: “Towards Scaling Difference Target Propagation by Learning Backprop Targets ”, Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

link-bibliography
https://www.sciencedirect.com/science/article/pii/S0169207021001874: “M5 Accuracy Competition: Results, Findings, and Conclusions ”, Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos

link-bibliography
https://arxiv.org/abs/2112.13314: “Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow ”, Florian Tambon, Amin Nikanjam, Le An, Foutse Khomh, Giuliano Antoniol

link-bibliography
2021-kirkpatrick.pdf#deepmind: “Pushing the Frontiers of Density Functionals by Solving the Fractional Electron Problem ”, James Kirkpatrick, Brendan McMorrow, David H. P. Turban, Alexander L. Gaunt, James S. Spencer, Alexander G. D. G. Matthews, Annette Obika, Louis Thiry, Meire Fortunato, David Pfau, Lara Román Castellanos, Stig Petersen, Alexander W. R. Nelson, Pushmeet Kohli, Paula Mori-Sánchez, Demis Hassabis, Aron J. Cohen

link-bibliography
https://www.word.golf/: Word Golf, Eric Xia

link-bibliography
https://arxiv.org/abs/2110.00683: “Learning through Atypical "Phase Transitions" in Overparameterized Neural Networks ”, Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, Gabriele Perugini, Riccardo Zecchina

link-bibliography
https://arxiv.org/abs/2106.08254#microsoft: “BEiT: BERT Pre-Training of Image Transformers ”, Hangbo Bao, Li Dong, Furu Wei

link-bibliography
https://arxiv.org/abs/2104.13963#facebook: “PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples ”, Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat

link-bibliography
https://arxiv.org/abs/2103.14005: “Contrasting Contrastive Self-Supervised Representation Learning Models ”, Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, Roozbeh Mottaghi

link-bibliography
https://arxiv.org/abs/2103.12719#facebook: “Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations ”, Chaitanya K. Ryali, David J. Schwab, Ari S. Morcos

link-bibliography
https://arxiv.org/abs/2102.06810#facebook: “DirectPred: Understanding Self-Supervised Learning Dynamics without Contrastive Pairs ”, Yuandong Tian, Xinlei Chen, Surya Ganguli

link-bibliography
https://arxiv.org/abs/2007.07779: “AdapterHub: A Framework for Adapting Transformers ”, Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych

link-bibliography
https://arxiv.org/abs/2006.07733#deepmind: “Bootstrap Your Own Latent (BYOL): A New Approach to Self-Supervised Learning ”, Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

link-bibliography
https://arxiv.org/abs/2005.12320: “SCAN: Learning to Classify Images without Labels ”, Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool

link-bibliography
https://arxiv.org/abs/2004.11362#google: “Supervised Contrastive Learning ”, Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan

link-bibliography
https://www.lesswrong.com/posts/SmDziGM9hBjW9DKmf/2019-ai-alignment-literature-review-and-charity-comparison: “2019 AI Alignment Literature Review and Charity Comparison ”, Larks

link-bibliography
https://arxiv.org/abs/1912.03098#google: “Connecting Vision and Language With Localized Narratives ”, Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari

link-bibliography
https://arxiv.org/abs/1909.13719#google: “RandAugment: Practical Automated Data Augmentation With a Reduced Search Space ”, Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1909.11942#google: “ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations ”, Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

link-bibliography
https://arxiv.org/abs/1905.00537: “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems ”, Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

link-bibliography
https://arxiv.org/abs/1810.04805#google: “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding ”, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

link-bibliography
https://arxiv.org/abs/1806.10779: “Differentiable Learning-To-Normalize via Switchable Normalization ”, Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li

link-bibliography
https://arxiv.org/abs/1803.05407: “Averaging Weights Leads to Wider Optima and Better Generalization ”, Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

link-bibliography
https://arxiv.org/abs/1802.08842: “Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari ”, Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter

link-bibliography
2017-silver.pdf#deepmind: “AlphaGo Zero: Mastering the Game of Go without Human Knowledge ”, David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis

link-bibliography
https://arxiv.org/abs/1708.07120: “Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates ”, Leslie N. Smith, Nicholay Topin

link-bibliography
https://www.sciencedirect.com/science/article/pii/S0896627317303653: “The Persistence and Transience of Memory ”, Blake A. Richards, Paul W. Frankland

link-bibliography
https://arxiv.org/abs/1705.05640: “WebVision Challenge: Visual Learning and Understanding With Web Data ”, Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc Van Gool

link-bibliography
https://arxiv.org/abs/1612.02297: “Spatially Adaptive Computation Time for Residual Networks ”, Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, Ruslan Salakhutdinov

link-bibliography
2009-raina.pdf: “Large-Scale Deep Unsupervised Learning Using Graphics Processors ”, Rajat Raina, Anand Madhavan, Andrew Y. Ng

link-bibliography
1994-opper.pdf: “Learning and Generalization in a Two-Layer Neural Network: The Role of the Vapnik-Chervonvenkis Dimension ”, Manfred Opper

link-bibliography
1993-olazaran.pdf: “A Sociological Study of the Official History of the Perceptrons Controversy [1993] ”, Mikel Olazaran

link-bibliography
1992-seung.pdf: “Statistical Mechanics of Learning from Examples ”, H. S. Seung, H. Sompolinsky, N. Tishby

link-bibliography
1992-hansel.pdf: “Memorization Without Generalization in a Multilayered Neural Network ”, D. Hansel, G. Mato, C. Meunier

link-bibliography
1989-mcclelland.pdf: “Parallel Distributed Processing: Implications for Cognition and Development ”, James L. McClelland

link-bibliography
1985-amit.pdf: “Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks ”, Daniel J. Amit, Hanoch Gutfreund, H. Sompolinsky

link-bibliography