πŸš€ PPO Agent: LunarLander-Kratuzen

This is a trained PPO (Proximal Policy Optimization) agent for the LunarLander-v2 environment, built with Stable-Baselines3.
Repo ID: KraTUZen/LunarLander
Model name: LunarLander-Kratuzen


πŸ“Š Performance

  • Mean Reward: 266.40 Β± 21.38
  • Episodes Evaluated: 10
  • βœ… Consistently lands successfully, showing stability and robustness.

πŸ› οΈ Usage

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Load model from Hugging Face Hub
model = load_from_hub(
    repo_id="KraTUZen/LunarLander",
    filename="LunarLander-Kratuzen.zip"
)

# Create environment
env = gym.make("LunarLander-v2")

# Run a quick evaluation loop
obs, info = env.reset()
for _ in range(20):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

πŸ“¦ Training Setup

Parameter Value
Algorithm PPO
Policy MlpPolicy
Timesteps 1,000,000
n_steps 1024
batch_size 64
gamma 0.999
gae_lambda 0.98
ent_coef 0.01

🎯 Key Takeaways

  • Achieves high reward and stable landings.
  • Ready-to-use with Hugging Face Hub.
  • Reproducible training setup for reinforcement learning experiments.
Downloads last month
87
Video Preview
loading

Evaluation results

  • mean_reward on state-action-landing-data
    self-reported
    266.40 +/- 21.38