Into AI

A Paper a Week - Week 13

Grandmaster level in StarCraft II using multi-agent reinforcement learning, Vinyals et al.

Posted on March 30, 2024

Introduction For the thirteenth post in this series, I read “Grandmaster level in StarCraft II using multi-agent reinforcement learning” by Vinyals et al. The paper details the development, training, and experimental results of the AlphaStar StarCraft II agent developed by DeepMind. While there are high-level similarities between AlphaStar and other... [Read More]

Tags:

week13
paper

A Paper a Week - Week 12

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver et al.

Posted on March 23, 2024

Introduction For the twelfth post in this series, I read the chess and shogi extension of last week’s paper on AlphaGo, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm” by Silver et al. This paper presents the AlphaZero algorithm, which is a generalization of the AlphaGo... [Read More]

Tags:

week12
paper

A Paper a Week - Week 11

Mastring the Game of Go without Human Knowledge, Silver et al.

Posted on March 16, 2024

Introduction For the eleventh post in this series, I decided to read one of my favorite papers, “Mastering the Game of Go without Human Knowledge”, by Silver et al. This paper presents the AlphaGo Zero algorithm, which was an improvement to the algorithm used to beat the 18-time world Go... [Read More]

Tags:

week11
paper

A Paper a Week - Week 10

Human-level control through deep reinforcement learning, Mnih et al.

Posted on March 9, 2024

Introduction For the tenth post in this series, I took a step back in the field of RL and decided to start with one of the seminal papers of deep reinforcement learning, “Human-level control through deep reinforcement learning”, by Mnih et al. This paper presents the Deep Q-Network architecture/algorithm, which... [Read More]

Tags:

week10
paper

A Paper a Week - Week 9

Deep Reinforcement Learning from Human Preferences, Ouyang et al.

Posted on March 2, 2024

Introduction For the ninth post in this series I initially wanted to read the paper on RLHF in relation to LLMs, as I’ve been seeing it in more and more papers as a leading method of fine-tuning LLMs for human preferences. However, in the process of reading this paper I... [Read More]

Tags:

week9
paper