Test Your Skills Against an AI Air Hockey Robot
UBC capstone project creates an AI agent and robot to play air hockey with an impressive simulator.
Teaching a robot to play air hockey is tricky enough when you let it practice against a real table. Hudson Nock and a small team took the harder road instead, training their agent entirely inside a simulator and then loading the finished policy onto physical hardware, where it went on to play a human with no real-world tuning at all. In machine learning terms that is a zero-shot transfer, and getting one to work on a game this fast and chaotic is genuinely hard.
What earns this project a closer look is the writeup behind it. A lot of builds like this get a short blurb and a clip, but this one ships with two lengthy technical reports (part one and part two) spanning roughly sixteen months of work, backed by a README that walks through every subsystem in turn. Between them they cover the simulation, the camera calibration, the firmware and its timing budget, the puck and collision modeling, and the reinforcement learning. For anyone who has wondered how a sim-to-real project actually hangs together, it is a rare chance to read one from start to finish.
The whole difficulty of going from simulation to reality is the gap between a tidy model and a messy world, and air hockey supplies plenty of mess. The original table was wooden, so the rails bounce differently depending on where the puck strikes them. The playing surface is not perfectly flat, the table is not quite rectangular, and hard accelerations drag the power supply voltage down. The team measured as much of this as they could, modeled what was modelable, and randomized the rest in the simulator so the agent never learns to depend on one exact set of conditions.
The modeling is where the groundwork goes in. The mallet's reaction to motor voltage is described by a third-order transfer function, driven by a feedforward and PID controller that follows a target path to within a millimeter. The puck's gliding motion obeys a simple nonlinear differential equation. Collisions proved more stubborn. The wooden rails scattered the puck so unpredictably that no clean analytical bounce model would fit, so the team trained a tiny neural network of just 112 parameters to predict both how the puck comes off a collision and how uncertain that result is. The simulator then samples from that uncertainty, which means the agent grows up expecting noisy, slightly unfair bounces rather than perfect ones.
Sensing is handled by a single camera. The puck is covered in retroreflective tape and lit by a bright LED array sitting right beside the lens, so it shows up as a crisp dot even at a 100-microsecond exposure short enough to freeze its motion. A custom calibration step deals with the large, slightly warped table and pins position error to around a millimeter across the whole surface. When the puck slips under the gantry and is partly blocked from view, a contour-based tracker keeps following it, and the same camera also locates the opponent's mallet at 120 frames per second using a hollow retroreflective marker with a recognizably different shape.
The simulator is the piece worth lingering on. Instead of reaching for an off-the-shelf physics engine, the team wrote one from scratch around analytical solutions. Both the mallet and the puck have closed-form equations of motion, so there is no slow numerical integration anywhere in the loop. Collision detection leans on an adaptive timestep with a provable lower bound on when the next contact can happen, letting it leap forward in big steps during quiet moments without ever skipping over a hit. With the code vectorized to run thousands of matches at once, the simulation clocks in at roughly 230 times real time on an ordinary Intel i5 laptop. That throughput is exactly what makes training the agent feasible in the first place.
The learning side uses Soft Actor-Critic with a network of about 200,000 parameters. Self-play on its own tends to make an agent good at beating only one kind of opponent, so this one also trains against a mixed cast: a dedicated defensive agent, a hand-coded blocker that imitates the way a person guards the goal, and earlier versions of itself that had specialized in either straight shots or bank shots. That spread of opponents, combined with domain randomization tuned to the noise actually measured on the hardware, keeps the policy from collapsing into one predictable habit.
All of the code, both reports, and the system diagram live on the project's GitHub page, and you can watch the robot trade shots in the video below.