Learn This, and You’ll Know 90% of What Matters in AI Today

Vlad Yashin
5 min readNov 20, 2024

--

If you’re serious about securing your future, this list is your treasure map.

Source: Axioma AI

These are the 27 papers that Ilya Sutskever, the co-founder of OpenAI, recommended to none other than John Carmack — the legendary creator of DOOM — when he asked how to quickly get smart about AI and its development today.

Let’s be real: AI is the frontier of innovation, and LLMs are stealing the spotlight.

But before you dive headfirst into the latest trends, take a moment to study the fundamentals.

In general, in life, start by studying the fundamental building blocks first!

Specifically for you, I’ve sorted these essential papers from beginner-friendly to brain-busting.

All papers from Ilya I categorized for you in three difficulty categories:

  • Beginner-Friendly
  • Intermediate
  • Advanced

And for those of you laser-focused on Large Language Models, I’ve flagged the absolute must-reads with a “📌”

Let’s go!

Beginner-Friendly

1. CS231n: Convolutional Neural Networks for Visual Recognition

This free Stanford course is the ultimate beginner’s guide to computer vision and convolutional neural networks.

Perfect for: Understanding image processing basics & how CNNs work.

2. Understanding LSTM Networks

Explains how LSTMs solve vanishing gradients in RNNs, with clear visuals and examples.

  • Pros: Beginner-friendly, visually intuitive.
  • Cons: Focused on RNNs, less relevant for Transformers.

Perfect for: Learning sequential data processing fundamentals.

3. The Annotated Transformer 📌

A hands-on walkthrough of the Transformer model, breaking complex ideas into digestible code snippets.

Perfect for: Anyone looking to understand and implement Transformers. A must for LLM-interested folks!

Ready to cut unnecessary operational costs?
✅ Check out our AI Labs and claim your FREE consultation to start building a smarter, data-driven future.

Need daily AI innovations and insider tips?
✅ Follow us on LinkedIn and 𝕏 for daily analysis & success stories, news, and actionable strategies.

Loved this article?
✅ Follow Axioma AI Journal for in-depth tutorials, top-tier analysis, and exclusive AI insights.

Intermediate

4. Attention Is All You Need 📌

Introduces the Transformer architecture, the backbone of LLMs like GPT and BERT.

  • Pros: Foundational for modern AI, directly relevant to LLMs.
  • Cons: Dense theoretical explanations can be challenging.

Perfect for: Understanding how LLMs work. Consider it like your AI Bible haha.

5. ImageNet Classification with Deep CNNs

AlexNet jump-started the deep learning revolution by proving CNNs’ power for image recognition.

Perfect for: Appreciating deep learning’s roots.

6. Deep Residual Learning for Image Recognition

ResNets introduced skip connections, solving vanishing gradients for deeper networks.

  • Pros: Practical insights for building deep networks.
  • Cons: Specific to image recognition.

Perfect for: Engineers pushing the limits of neural network depth.

7. Neural Machine Translation by Jointly Learning to Align and Translate

Laid the groundwork for seq2seq models with attention, paving the way for Transformers.

Cons: HEAVY math, like realy heavy, less approachable for beginners.

8. Scaling Laws for Neural LMs 📌

Explains why “bigger is better” for LLMs, with insights into scaling compute, data, and parameters.

  • Pros: Incredibly relevant to LLM research and development.
  • Cons: Theoretical, not implementation-focused.
  • Perfect for: Fundamental understanding why LLMs like GPT-4 succeed.

9. Recurrent Neural Network Regularization

Techniques to improve RNNs’ robustness and performance.

Practical, improves real-world RNN applications.

Limited relevance with the rise of Transformers. Perfect for understanding RNN optimization techniques!

10. Neural Turing Machines

Explores memory-augmented neural networks for reasoning tasks.

11. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Describes engineering techniques for training massive models.

  • Pros: Essential for scaling LLM training.
  • Cons: Requires good above-average understanding of distributed systems.
  • Perfect for: Optimizing compute for large-scale AI projects.

Advanced

12. Variational Lossy Autoencoder

Explores lossy data compression using variational principles.

  • Perfect for: Advanced learners exploring unsupervised learning.

13. Multi-Scale Context Aggregation by Dilated Convolutions

Improves receptive fields without extra computational cost.

  • Pros: Useful for segmentation and sequence tasks.
  • Perfect for: Specialized tasks like semantic segmentation.

14. The Unreasonable Effectiveness of RNNs

Celebrates RNNs’ versatility across domains before the Transformer era.

  • Perfect for: Historical perspective on sequence modeling.

15. Identity Mappings in Deep Residual Networks

Optimizes ResNets, critical for very deep architectures.

  • Perfect for: Understanding incremental innovations in deep learning.

16. A Simple Neural Network Module for Relational Reasoning

Focuses on relational reasoning within neural networks.

  • Pros: Essential for tasks requiring relational reasoning.
  • Perfect for: Advanced cognitive AI tasks.

17. Pointer Networks

Introduces attention for dynamic output lengths.

  • Pros: Crucial for structured data tasks.
  • Perfect for: Solving combinatorial optimization problems.

18. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

Explores theoretical complexity in AI systems.

  • Perfect for: Bridging AI and physics! A solid read which is highly recommended!

19. Kolmogorov Complexity (pg. 434+)

Explores data complexity and compression theories.

  • Pros: Foundational for understanding data encoding.
  • Cons: Extremely theoretical. One of the most theoretical papers out there!
  • Perfect for: Researchers delving into algorithmic complexity.

20. The First Law of Complexodynamics

Discusses complexity and order in closed systems.

  • Cons: Abstract, limited immediate applications.
  • Perfect for: Interdisciplinary research.

21. The Coffee Automaton

Examines complexity in closed systems, merging AI and physics.

  • Perfect for: Theoretically inclined researchers.

Other Papers

Below are the papers I chose to group separately, as they dive into pretty specialized topics that are highly theoretical, interdisciplinary, or tailored to advanced AI applications.

That said, they’re still incredibly valuable for anyone looking to deepen their understanding of the nature and future of modern AI systems.

22. Machine Super Intelligence Dissertation

This paper focuses on the long-term implications and challenges of creating superintelligent AI. It’s less about practical engineering and more about the philosophy, ethics, and governance of AI.

23. Keeping Neural Networks Simple by Minimizing the Description Length of Weights

It explores theoretical principles to simplify neural networks, helping make models more interpretable and efficient. It’s ideal for researchers aiming to reduce complexity in model design.

24. Neural Quantum Chemistry

This paper highlights how AI intersects with quantum chemistry, offering insights into how neural networks can transform scientific fields beyond traditional AI domains.

25. Deep Speech 2

While speech recognition isn’t as central to today’s focus on LLMs, this paper provides a detailed look at an end-to-end model for speech processing — a key application area of AI.

26. Relational RNNs

This paper adds reasoning capabilities to traditional RNNs, pushing the boundaries of what recurrent models can achieve. It’s ideal for tackling advanced reasoning tasks, although RNNs are less commonly used today due to Transformers.

27. Tutorial on Minimum Description Length Principle

A beginner-friendly introduction to MDL, this paper bridges theoretical insights with practical applications, focusing on model generalization and compression — important but less directly relevant to modern LLMs.

Thanks for reading!

--

--

No responses yet