
Software Subsystem
2025 Software Team
Raina Hu, Majd Khalife, Albert Wang, Nathaniel Factor, Malak Oualid, Lauren Spee, Jenny Saad, Allison Hutchings, William Zhang, Chantal Zhang, Owen Lesann, Le Chen, Justine Lin, Rachel Ruddy, Negar Akbarpouran Badr, Andreaa-Nicole Calenciuc, Tancrede Lamort de Gail, Lara Landauro, Aditya Sharma
Software Architecture
The software stack is constructed around the Robot Operating System (ROS), an open-source subscriber-publisher model that allows for multiple independent processes called nodes to communicate easily. Most of our codebase is written using Python 3. For latency-sensitive nodes, such as the republishing of sensor data, C++ is used. A systems diagram showing the various nodes in our system and how they interact with each other is shown in Figure 1. This software stack runs on Nvidia L4T 35.4.1 (Ubuntu 20.04) with Docker. Processes are executed and managed by Docker containers on a local Docker network. Docker containers are configured with Ubuntu 20.04, PyTorch, CUDA, and ROS to enable cross-platform compatibility independent of development and production environments.
An on-board NVIDIA Jetson AGX Orin provides the necessary computational power to run all packages concurrently within the Docker containers. Additionally, the Jetson is equipped with a 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores, which allows us to run our YOLOv8 model for object detection in real time.
The main focus for this year was on thoroughly testing previous code that has only been tested in simulation and ensuring we have reliable state estimation, controls and propulsion systems. One of the previous problems from last year was the lack of a thorough testing procedure and framework. A significant portion of our efforts this year was focused on testing, debugging, and standardizing procedures to improve pre-competition preparation.
The software stack is split into four main purposes: sense, plan, act, and sim. These modules are modularized such that modifications to a specific subsystem could be done in branches which do not involve overarching changes to the software architecture.
Vision
The vision package interprets the environment using a front-facing stereo RBGD camera along with a down-cam. Both of the front-facing camera feeds are processed using a fine-tuned YOLO-v8 model for robust object detection. The model is fine-tuned over 300 images of Robosub objects labelled with Roboflow, which have then been augmented to more than 10,000+ images. For the downward-facing camera, state estimation and camera intrinsics data help calculate direction vectors to detected objects, enabling rough positional estimates on the pool floor. By correlating the 2D bounding box detections with the 3D camera distance information, we are able to accurately infer the 3D positions of detected objects in the robot's reference frame.
For the down-facing camera, we detect the presence of objects using a simple HSV filter. This filter is tuned in the simulation and in pool tests. The HSV filter identifies whether an object is in the frame and enables the grabber to accurately retrieve objects.
For tasks involving the grabber which require precise manipulation, a basic visual servoing setup has been implemented. This system leverages the real-time visual feedback from the cameras to adapt the AUV's pose dynamically, guiding the grabber to its target. Since this is our first year integrating a grabber, this system will need further refinements in subsequent years. The software team collaborated closely with the mechanical team to find a proper placement for the grabber that will ensure the grabber is within the field of view of the down camera.
Sense
Douglas is equipped with an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL) and a depth sensor. The data streams from the sensors are fused and processed by the robot_localization ROS node, which employs an Extended Kalman Filter (EKF) to provide a comprehensive and reliable estimate of the robot's current pose (position and orientation).
The EKF takes noisy measurements and predictions of the state of the robot to determine an accurate prediction of the robot's current status by correcting its motion model guess based on new sensor readings. The correction is performed using a Kalman Gain value, which is calculated based on the noise of the prediction and measurement models. The value acts as a confidence estimate of the measurements and predictions. Douglas is also equipped with a depth sensor, which uses an unbiased reading of depth. We thus use it as our sole determinant of position in the z-axis. The EKF supplies a baseline coordinate estimate for SLAM (Simultaneous Localization and Mapping).
SLAM is performed using a complex network of landmark predictions, local odometric state estimation, and an iterative process of updating our estimations using newly obtained information. An additional EKF is appended with landmark position estimates on top of Douglas’ baseline state estimation. This creates a large state vector which mathematically represents the localization and mapping. Using the vision system, landmark positions are polled for every frame. These landmark positions (relative positions) become useful after converting them to a preliminary global estimation using the baseline coordinate estimates calculated prior. These landmarks are then matched using data association (closest neighbour) to the relevant terms in the state vector. The EKF calculates a new Kalman Gain by linearization and updates the state vector using the confidence and noise values of these landmarks.
By repeatedly processing the landmark positions and the baseline coordinate estimations in one encompassing Kalman Filter, the two independent systems help align each other in a back-and-forth process. This multi-sensor fusion approach significantly enhances the overall accuracy, robustness, and reliability of the pose estimate, particularly in dynamic and challenging underwater environments where individual sensors might be prone to error or temporary outages.
Planning & Controls
High-level mission planning is managed by a dedicated package that employs a state machine (SMACH) to effectively navigate through each competition task. We construct missions based on the requirements of the competition and our strategy. State transitions are dynamically determined based on the outcome of the dispatched actions, allowing for adaptive and robust mission execution. Each possible competition run state is assigned a corresponding function, which can dispatch actions to the controls package and monitor progress of the missions. These missions are built on abstract movement and actions endpoints which are managed by the controls package in the backend.
The controls and propulsion packages ensure precise AUV movement using a Proportional-Integral-Derivative (PID) Controller. The controls package receives target positions and orientations, which are then used in multiple PID loops to generate body-framed forces and torques. The propulsion package then decomposes these forces and torques into thruster Pulse Width Modulation (PWM) speeds, which are communicated to the Electronic Speed Controllers (ESCs) via a ROS-serial node. A significant issue we faced in Robosub'24 was mechanical drift that software struggled to compensate for. To address this issue, we worked with both the mechanical and electrical teams to compensate for unevenly-powered thrusters, electromagnetic interference which affects the IMU’s magnetometer and an offset to the robot’s center of mass. Through multiple pool tests, we were able to consistently move with limited drift.
To facilitate the tuning of PID control loops and aid in debugging, the visualization tool Foxglove was implemented. This tool provides an intuitive interface for visualizing sensor data, control inputs, and AUV behavior in real-time, greatly streamlining the process of optimizing the control system during pool testing and development.
Simulation
Comprehensive testing of the Douglas AUV's software stack is facilitated by a custom Unity simulation environment that replicates the competition setting and allows for easy testing of new planner code. This robust environment allows for code testing and the simulation of full competition runs in a controlled and repeatable manner. A software endpoint in the ROS software stack in setup, which utilizes sockets to perform real-time communication through a network. The network simplifies communication with Docker containers as the network is managed through the use of Docker containers.
Furthermore, a continuous integration (CI) pipeline on GitHub has been established to perform regular automated tests using Unity. Work has been done to de-bloat the previous simulation builds. In the future, the team plans to implement simulated sensor data capabilities within this environment, which will enable thorough testing of state estimation algorithms without the need for physical pool tests, thereby accelerating development and validation cycles.