RL Controller for Balancing Platforms

Projects | 2026 | Links:

This project focuses on the design and implementation of a high-performance control system based on Reinforcement Learning (RL) for automatic balancing platforms. The research demonstrates how learning-based policies can outperform classical control strategies, such as PD controllers, when managing complex non-linear spatial dynamics.

Introduction and Requirements

The control of unstable systems is a fundamental challenge in robotics, often serving as a benchmark for testing new algorithms. Traditional methods, such as PD controllers, are effective for linear systems but often require exhaustive manual tuning and struggle with the non-linearities and singular points inherent in complex spatial geometries.

Motivated by the potential of Artificial Intelligence in control theory, this project explores Reinforcement Learning (RL) as a robust alternative. The primary goal was to design a controller capable of achieving sub-millimetric precision in balancing a ball at specific coordinates. The project was built upon several key requirements: the system had to be simulated with high physical fidelity, the controller needed to be robust against environmental noise (Domain Randomization), and it had to support two distinct platform architectures:

2D Configuration: A 3-DoF system enabling lateral (X), vertical (Z), and pitch (θ) movements.
3D Configuration: A multi-link 3-DoF platform managing Pitch, Roll, and Heave (Z) to stabilize the ball in a 3D workspace.

Selected Tools

This project integrates mechanical design, advanced physics simulation, and machine learning.

Design and Simulation Tools

Autodesk Fusion 360 was used to design the mechanical components and generate the digital twins. These designs were then translated into URDF and MJCF (XML) formats. For the simulation, I utilized MuJoCo for its accuracy in resolving contact dynamics and Genesis for its ability to run massively parallel simulations on the GPU.

Software and AI Tools

The control logic was implemented in Python, leveraging the Stable Baselines3 and Genesis RL frameworks. The Proximal Policy Optimization (PPO) algorithm was chosen for its stability and efficiency in continuous action spaces. All data analysis and performance visualization (heatmaps) were performed using Matplotlib and NumPy.

Platform Design

The project features two innovative designs. The 2D Platform utilizes a kinematic structure that allows for simultaneous translation and rotation. The 3D Platform is inspired by parallel manipulators, utilizing a 3-RRS-like structure to provide 3 Degrees of Freedom.

Unlike simpler versions, both platforms in this project were designed with 3 DoF to increase the complexity of the control task and demonstrate the RL agent’s ability to handle coupled dynamics. I developed a geometric Inverse Kinematics (IK) solver for both systems to map spatial coordinates to the required motor angles, providing a foundation for both classical and learning-based control.

2D Platform

3D Platform

System Architecture and Controller Development

The system architecture follows an end-to-end learning approach. The “Intelligence Layer” consists of a PPO agent that receives a high-dimensional observation vector (including ball position, velocity, and motor states) and directly outputs the target positions for the servos.

To ensure the policy could generalize beyond simulation, I applied Domain Randomization (DR). During training, physical parameters such as the ball mass and surface friction were randomly varied. This forced the neural network to develop a control strategy that is inherently robust to modeling inaccuracies, rather than just memorizing a specific simulated environment.

Simulation and Performance Evaluation

Training was conducted in environments with hundreds of parallel platforms, significantly accelerating the convergence of the PPO algorithm. To evaluate the results, I compared the RL controller against a manually tuned PD (Proportional-Derivative) baseline.

The evaluation focused on the Steady-State Integrated Absolute Error (SS-IAE) across the entire reachable workspace. Detailed precision heatmaps were generated to visualize how each controller performed as the ball moved further from the center, where non-linear effects are most prominent.

Final Result

The project successfully fulfilled all initial objectives, marking a comprehensive integration of mechanical design and intelligent control:

Design and Modeling: All platform components were designed and modeled from scratch using Autodesk Fusion 360, creating accurate digital twins for simulation.
Control Strategies: I successfully implemented and tuned a classical Proportional-Derivative (PD) controller as a baseline and developed an advanced RL-based controller for both configurations.
Training and Robustness: Using the PPO algorithm combined with Domain Randomization, I achieved a robust policy capable of stabilizing the ball under varying physical conditions.
Comparative Metrics: A rigorous performance metric was established to compare both controllers.
Computer Vision Integration: Beyond the core control task, I took the first steps toward autonomous perception by successfully estimating the ball’s position using a simulated camera within MuJoCo, laying the groundwork for vision-based control loops.

Code