🚀 PPO Agent: LunarLander-Kratuzen

This is a trained PPO (Proximal Policy Optimization) agent for the LunarLander-v2 environment, built with Stable-Baselines3.
Repo ID: KraTUZen/LunarLander
Model name: LunarLander-Kratuzen

📊 Performance

Mean Reward: 266.40 ± 21.38
Episodes Evaluated: 10
✅ Consistently lands successfully, showing stability and robustness.

🛠️ Usage

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Load model from Hugging Face Hub
model = load_from_hub(
    repo_id="KraTUZen/LunarLander",
    filename="LunarLander-Kratuzen.zip"
)

# Create environment
env = gym.make("LunarLander-v2")

# Run a quick evaluation loop
obs, info = env.reset()
for _ in range(20):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

📦 Training Setup

Parameter	Value
Algorithm	PPO
Policy	MlpPolicy
Timesteps	1,000,000
n_steps	1024
batch_size	64
gamma	0.999
gae_lambda	0.98
ent_coef	0.01

🎯 Key Takeaways

Achieves high reward and stable landings.
Ready-to-use with Hugging Face Hub.
Reproducible training setup for reinforcement learning experiments.

Downloads last month: 87

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on state-action-landing-data
self-reported

266.40 +/- 21.38