Journal of Innovative Science and Engineering (JISE), cilt.10, sa.1, ss.138-157, 2026 (TRDizin)
This study aims to systematically compare the performance of two deep reinforcement learning algorithms – Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) – across different game environments. To achieve this, eight distinct test environments from the OpenAI Gymnasium library (CartPole-v1, FrozenLake-v1, LunarLander-v3, Taxi-v3, MountainCar-v0, Blackjack-v1, CliffWalking-v0, and Acrobot-v1) were utilized. Each environment was trained over 1,000,000 timesteps. For each algorithm, key performance metrics such as average reward, training time, standard deviation, success rate, and the highest and lowest reward values were calculated and visualized through graphs. Additionally, the strengths and weaknesses of the algorithms in different environments were analyzed. The results indicate that PPO performs more consistently and effectively in tasks requiring continuous actions, whereas DQN achieves faster and more reliable outcomes in deterministic environments with discrete action spaces. This study provides meaningful insights by comparing the performance of PPO and DQN under identical conditions, while most prior research has examined these algorithms separately.