Thoughts

Reinforcement Learning - The paradigm with the power to surpass human intelligence

June 4, 2026·2 min read

#ai#ril #reinforcementlearning andrejkarpathykoi mil gaya

Trigger of Thought

40 minutes of Andrej Karpathy today, and I was humbled by one of the top AI scientists in the world right now. When I listened to him talk about reinforcement learning and how it correlates to human learning and the thought processes we used to follow in school, I was prompted to write this piece of thought.

The actual thought

I remember that it was quite a famous news story about how AI was able to beat a top Korean player at Go, Lee Sedol, the guy I watched in a Korean game show – The Devil’s Plan. These people represent some of the best thinkers of humanity and the best-trained ones as well, and the fact that an AI called AlphaGo was able to beat him with a famous “Move 37“ and change the history and view on AIs forever. All of it made possible by a concept called “Reinforcement Learning” (RIL).

What RIL fundamentally means is to let the AI discover the best possible way to solve a problem, rather than the right way or approach to be provided by a human. When Karpathy said that we are not AI and the model is not us, so we can’t really tell them how best to solve a problem, it gave me chills. The possibility of an AI model surpassing human problem-solving at peaks, and the possibility of it designing a new language of its own, maybe something that would be better than English to solve problems, is a shocking but very possible outcome.

Going off the rails now, but somehow I remember the alien “po poi“ music sent to the skies in the movie Koi Mil Gya, lol! Maybe AI will make its own music. But this is what has prompted me to work more in the direction of building responsible AI, one that doesn’t go off the rails and turn against us. Okay, this is getting pretty dystopian now, so let’s end the article here and be responsible.

What are tokens in the context of Large Language Models?