This repository contains simple demonstration code for Piper RL, showing how to use Piper RL to train a simple task: Reach Target, the task requires the center of Piper’s gripper to reach a specified target position without demanding the end pose of Piper’s end-effector, achieved through a simple reward function. The repository provides training examples under two simulators: Mujoco and Genesis.
RepositoriesEnvironment Dependencies- Install RL-related dependencies
pip install -r requirements.txt- Install Genesis-related dependencies
- PytorchInstall Pytorch and select the corresponding installation command based on your CUDA version. Find the Pytorch command for your CUDA version at link. For example, for CUDA 12.9, use the following installation command:
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129- Genesis WorldInstall Genesis World with the command:
pip install genesis-world- Install Mujoco
pip install mujoco==3.3.2Genesis ExamplesLoad Piper model in GenesisRun genesis_demo/hello_genesis.py
python genesis_demo/hello_genesis.pyYou can see the Piper model is loaded successfully.
Run genesis_demo/control_piper.py
python genesis_demo/control_piper.pyYou can see Piper moves according to the set position.
Run genesis_demo/multi_piper.py
python genesis_demo/multi_piper.pyRun piper_rl_genesis.py
python piper_rl_genesis.pyYou can see multiple Pipers attempting to approach the set position.
Launch tensorboard to view the reward changes of multiple Pipers during training:
tensorboard --logdir tensorboard/piper_reach_target/- Initialize the env
# For more detailed methods of inheriting gym.Env, refer to the gym examples provided by OpenAI
class PiperEnv(gym.Env):
#__init__(): Initializes the environment, robot parameters, action space and observation space,
# facilitating the reinforcement learning algorithm to search for appropriate actions in the given observation space
def __init__(self, visualize: bool = False):
super(PiperEnv, self).__init__()
self.visualize = visualize
# Set the joint indices to be controlled
self.jnt_name = [
"joint1",
"joint2",
"joint3",
"joint4",
"joint5",
"joint6"
]
# Set the generation space for random target points and the workspace of the Piper robotic arm
self.workspace = {
'x': [-0.5, 1.5],
'y': [-0.8, 0.8],
'z': [0.05, 0.5]
}
# Set the device for the environment
self.tensor_device = "cpu"
self.gs_device = gs.cpu
# Set joint limits
self.jnt_range = torch.tensor([
[-2.61, 2.61],
[0, 3.14],
[-2.7, 0],
[-1.83, 1.83],
[-1.22, 1.22],
[-1.57, 1.57]
], device=self.tensor_device)
# PD controller parameters for Piper robotic arm joints
self.kp = torch.tensor([4500, 4500, 3500, 3500, 2500.0, 2500.0], device=self.tensor_device)
self.kv = torch.tensor([450.0, 450.0, 350.0, 350.0, 250.0, 250.0], device=self.tensor_device)
gs.init(backend = self.gs_device)
# Create genesis scene
self.scene = gs.Scene(
# Set camera
show_viewer = self.visualize,
viewer_options = gs.options.ViewerOptions(
camera_pos = (3.5, -1.0, 2.5),
camera_lookat = (0.0, 0.0, 0.5),
camera_fov = 40,
),
# Set physics engine
rigid_options = gs.options.RigidOptions(
dt = 0.01,
),
)
# Add ground plane
plane = self.scene.add_entity(
gs.morphs.Plane(),
)
# add piper robot
self.robot = self.scene.add_entity(
gs.morphs.MJCF(file='xml/agilex_piper/piper.xml'),
)
# Build the scene
self.scene.build()
# The initial pose is Piper robotic arm's default pose (all joint angles are 0)
self.default_joint_pos = torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0, 0.0], device=self.tensor_device)
# Define the position (x,y,z) of the end-effector (actually, the end-effector position is not (0,0,0) when Piper is in the initial pose)
self.default_ee_pos = torch.tensor([0.0, 0.0, 0.0], device=self.tensor_device)
# Define action space and observation space to ensure joint values do not exceed -PI to PI
self.action_space = spaces.Box(low=-3.14, high=3.14, shape=(6,), dtype=np.float32)
# Define observation space, including 6 joint angles and 3 end-effector positions
self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(9,), dtype=np.float32)
self.motors_dof_idx = [self.robot.get_joint(name).dof_start for name in self.jnt_name]
self.robot.set_dofs_kp(self.kp, self.motors_dof_idx)
self.robot.set_dofs_kv(self.kv, self.motors_dof_idx)
self.goal = torch.tensor(torch.zeros(3, dtype=torch.float32), device=self.tensor_device)
self.last_action = torch.tensor(torch.zeros(6, dtype=torch.float32), device=self.tensor_device)
self.goal_threshold = 0.005- Design the reward function
def calc_reward(self, action, obs):
# Calculate the distance between the robotic arm joints and the target position
dist_to_goal = torch.linalg.norm(obs[6:] - self.goal)
# Non-linear distance reward
if dist_to_goal < self.goal_threshold:
distance_reward = 100.0
elif dist_to_goal < 2*self.goal_threshold:
distance_reward = 50.0
elif dist_to_goal < 3*self.goal_threshold:
distance_reward = 10.0
else:
distance_reward = 1.0 / (1.0 + dist_to_goal)
# Action-related penalty: penalize large joint changes
action_diff = action - self.last_action
smooth_penalty = 0.1 * torch.linalg.norm(action_diff)
# Joint angle limit penalty
joint_penalty = 0.0
for i in range(6):
min_angle = self.jnt_range[i][0]
max_angle = self.jnt_range[i][1]
if obs[i] < min_angle:
joint_penalty += 0.5 * (min_angle - obs[i])
elif obs[i] > max_angle:
joint_penalty += 0.5 * (obs[i] - max_angle)
# Total reward calculation
total_reward = distance_reward - smooth_penalty - joint_penalty
# Update last action
self.last_action = action.clone()
return total_reward, dist_to_goal- Set the step function
def step(self, action):
action_tensor = torch.tensor(action, device=self.tensor_device, dtype=torch.float32)
# Action scaling: normalize observation values with different physical dimensions.
# Neural networks are more sensitive to large-value features, leading to unstable training;
# scaling ensures all observations are in a similar range
scaled_action = torch.zeros(6, device=self.tensor_device, dtype=torch.float32)
for i in range(6):
scaled_action[i] = self.jnt_range[i][0] + (action_tensor[i] + 1) * 0.5 * (self.jnt_range[i][1] - self.jnt_range[i][0])
# Genesis robot executes the action
self.robot.control_dofs_position(scaled_action, self.motors_dof_idx)
# Genesis scene simulates one step
self.scene.step()
# Observation
obs = self.get_observation()
# Calculate reward
reward, dist_to_goal = self.calc_reward(action_tensor, obs)
terminated = False
if dist_to_goal < self.goal_threshold:
terminated = True
if not terminated:
if time.time() - self.start_t > 20.0:
reward -= 10.0
print(f"[Timeout] Time exceeded, reward halved")
terminated = True
info = {
'is_success': terminated and (dist_to_goal < self.goal_threshold),
'distance_to_goal': dist_to_goal.item()
}
return obs.cpu().numpy(), reward.item(), terminated, False, infoMujoco ExamplesImplement parallel training of multiple Pipers in MujocoRun piper_rl_mujoco.py
python piper_rl_mujoco.pyLaunch tensorboard to view the reward changes of multiple Pipers during training:
tensorboard --logdir tensorboard/piper_reach_target/Run piper_rl_mujoco.py
python piper_rl_mujoco.pyYou can see Piper successfully reaches the target position.
https://github.com/LitchiCheng/mujoco-learning
Create TopicDiscard




Comments