What is AI Learn by Perfionix AI?

AI Learn is a free interactive platform by Perfionix AI, founded by Shubham Rahandgdale, to learn Artificial Intelligence and Machine Learning through hands-on coding challenges. It offers 500+ problems across various topics like ML, Deep Learning, NLP, and Computer Vision.

Is AI Learn free to use?

Yes, AI Learn by Perfionix AI is completely free to use. You can access all coding problems, learning tracks, and features without any payment.

What topics does AI Learn cover?

AI Learn covers AI Foundations, Machine Learning, Deep Learning, Natural Language Processing (NLP), Computer Vision, and Reinforcement Learning with structured learning paths.

Who founded AI Learn?

AI Learn was founded by Shubham Rahandgdale, the founder of Perfionix AI, with a mission to make AI education accessible to everyone.

Do I need prior programming experience?

Basic Python knowledge is recommended. AI Learn offers problems from beginner to advanced levels, so you can start with easy problems and progress gradually.

Can I track my learning progress?

Yes, AI Learn provides a comprehensive dashboard to track your progress, XP points, streaks, solved problems, and global ranking on the leaderboard.

RL FundamentalsMarkov Decision Processes

Markov Decision Processes

45 min

RL Fundamentals

MDPs provide the mathematical framework for RL. They model sequential decision-making where outcomes depend on actions taken in different states.

Definition

A Markov Decision Process is defined by states (S), actions (A), transition probabilities (P), rewards (R), and a discount factor (γ). The Markov property states that the future depends only on the current state, not the history.

Key Concepts

State

A representation of the environment at a given time. Everything the agent needs to make a decision.

Action

What the agent can do to interact with the environment.

Reward

Immediate feedback from the environment after taking an action. Goal is to maximize cumulative reward.

Policy

A mapping from states to actions. Can be deterministic or stochastic.

Value Function

Expected cumulative reward from a state (V) or state-action pair (Q).

Real-World Applications

AlphaGo

Gaming/AI

DeepMind's Go-playing AI used RL to master the game, defeating world champions.

Robotics

Boston Dynamics uses RL to teach robots to walk, run, and recover from disturbances.

Data Center Cooling

Infrastructure

Google uses RL to optimize cooling systems, reducing energy consumption by 40%.

Code Example

python

import numpy as np

class GridWorld:
    """
    Simple GridWorld environment
    """
    def __init__(self, size=4):
        self.size = size
        self.state = [0, 0]  # Start position
        self.goal = [size-1, size-1]  # Goal position
        self.actions = ['up', 'down', 'left', 'right']

    def reset(self):
        self.state = [0, 0]
        return tuple(self.state)

    def step(self, action):
        """Take action, return (next_state, reward, done)"""
        new_state = self.state.copy()

        if action == 'up' and self.state[0] > 0:
            new_state[0] -= 1
        elif action == 'down' and self.state[0] < self.size - 1:
            new_state[0] += 1
        elif action == 'left' and self.state[1] > 0:
            new_state[1] -= 1
        elif action == 'right' and self.state[1] < self.size - 1:
            new_state[1] += 1

        self.state = new_state

        # Check if goal reached
        done = self.state == self.goal
        reward = 1 if done else -0.1  # Small penalty for each step

        return tuple(self.state), reward, done

class QLearning:
    """
    Q-Learning agent
    """
    def __init__(self, states, actions, learning_rate=0.1, discount=0.99, epsilon=0.1):
        self.actions = actions
        self.lr = learning_rate
        self.gamma = discount
        self.epsilon = epsilon

        # Initialize Q-table
        self.q_table = {}
        for s in states:
            self.q_table[s] = {a: 0 for a in actions}

    def choose_action(self, state):
        """Epsilon-greedy action selection"""
        if np.random.random() < self.epsilon:
            return np.random.choice(self.actions)
        else:
            return max(self.q_table[state], key=self.q_table[state].get)

    def update(self, state, action, reward, next_state, done):
        """Q-Learning update rule"""
        current_q = self.q_table[state][action]

        if done:
            target = reward
        else:
            max_next_q = max(self.q_table[next_state].values())
            target = reward + self.gamma * max_next_q

        # Q-Learning update
        self.q_table[state][action] += self.lr * (target - current_q)

# Training
env = GridWorld(size=4)
states = [(i, j) for i in range(4) for j in range(4)]
agent = QLearning(states, env.actions)

# Train for 1000 episodes
for episode in range(1000):
    state = env.reset()
    total_reward = 0

    for _ in range(100):  # Max steps per episode
        action = agent.choose_action(state)
        next_state, reward, done = env.step(action)
        agent.update(state, action, reward, next_state, done)

        total_reward += reward
        state = next_state

        if done:
            break

    if episode % 200 == 0:
        print(f"Episode {episode}, Total Reward: {total_reward:.2f}")

# Show learned policy
print("\nLearned Policy:")
for i in range(4):
    row = ""
    for j in range(4):
        if (i, j) == env.goal:
            row += " G "
        else:
            best_action = max(agent.q_table[(i, j)], key=agent.q_table[(i, j)].get)
            symbols = {'up': '↑', 'down': '↓', 'left': '←', 'right': '→'}
            row += f" {symbols[best_action]} "
    print(row)

Q-Learning learns the value of state-action pairs through trial and error. The agent explores the environment, updates its Q-table based on rewards, and gradually learns the optimal policy.

Practice Problems

1Implement SARSA and compare with Q-Learning
2Add obstacles to the GridWorld
3Visualize the Q-values as they change during training

Summary

MDPs and Q-Learning form the foundation of RL. Understanding these concepts is essential before moving to deep RL methods like DQN and policy gradients.