1. Background

Training a robot to perform even a simple task requires vast amounts of simulation and experimental data. Take, for example, the action of opening a door—this seemingly straightforward behavior demands experience across a wide range of handle designs, materials, and physical conditions. In real-world environments, however, replicating such diversity is difficult, and collecting real data can be time-consuming and expensive. Moreover, physical data collection is limited by sensor noise, reproducibility challenges, and safety risks, making it difficult to gather large-scale, high-fidelity training datasets.

To overcome these challenges, many robotics researchers and companies rely on simulation-based virtual environments to train their own model.

One widely adopted paradigm in robotics is the Real2Sim2Real (R2S2R) workflow. This structure enables robots to generalize from physical experiences into simulation, and then back to the real world. However, one of the most critical and still unsolved challenges lies in the Real2Sim phase: converting real-world information into a usable simulation environment. In most cases, this involves a time-consuming manual process of converting real-world geometry or object scans into simulation-ready assets using CAD software and physics engines. This often requires a skilled engineer to spend a lot of time.

Although research into scan-based reconstruction for simulation is progressing, there is still a lot of room for improvement in the quality and management of this data.

image.png

(*Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware)

To address this challenge, we developed a pipeline that automatically converts CAD design data into simulation-ready objects and environments that robots can be trained. In particular, our system can infer articulation structure from static CAD models and transform them into articulated objects, enabling not just visual representation, but actual robot interaction and task learning.

This approach dramatically lowers the cost, time, and complexity of producing high-quality training data for robotics. It turns CAD models into rich, interactive assets that seamlessly integrate into robot simulators—making large-scale, diverse data generation faster, cheaper, and more flexible than ever before.

2. Limitations of Text/Image-to-3D: The Gap Between “Looking Right” and “Working Right”

Recent advances in text/image-to-3D technology have made it possible to create visually impressive 3D models with just a few lines of text or images. However, this technology faces fundamental limitations when it comes to use in simulation or robotic learning contexts.

First, there is no articulation. Most Text/Image-to-3D technology generates a single, unified mesh. While the model may look like it’s composed of multiple parts, it is structurally a solid shape—meaning joints can not be defined, and no movement can be simulated within a physics engine. A model generated from the prompt “a foldable robot arm” may look foldable, but it can not actually be foldable because it doesn’t have joints.

Second, the outputs often suffer from jitter and irregular mesh topology. Unlike structured CAD based on Non-Uniform Rational B-Spline (NURBS), 3D mesh generated by Text/Image-to-3D technology does not guarantee a precise shape, and the surface is smooth due to the lack of continuity of curves, which requires post-processing. This can lead to collision errors in simulators, unstable physical behavior, and increased computational load—ultimately hindering training performance. Consequently, low-quality data can not close the gap between the real world and the virtual world.

Third, the lack of part separation is a critical bottleneck. For mechanical systems to operate, individual components must be defined independently and connected through joints. Most Text/Image-to-3D outputs do not have these part-level definitions or hierarchical structure, making them nearly impossible to convert into articulated objects. Because of these issues, Text/Image-to-3D is suitable for visual prototyping, but not for physics-grounded simulation or robot learning.

That’s exactly where NdotLight’s Text/Image-to-CAD–based 3D data generation pipeline makes a difference. It understands and preserves structural relationships between parts, and automatically generates articulated objects with defined motion capabilities. This allows physical behaviors to be precisely reproduced in virtual environments—bridging the gap between visual plausibility and functional utility.

3. Text-to-CAD–Based 3D Simulation Data Generation Solution

Traditional CAD models are typically limited to defining the form of an object. They can describe what components look like, where they’re located, and how they’re shaped—but not how they move. That is, a CAD model may resemble a real-world product, but it still represents a static, non-functional structure.

NdotLight’s Trinix solution changes that. It analyzes each component and makes inferences: If these two parts are in contact, how might they relate? If a structure repeats symmetrically, could it imply a rotational axis? Much like observing the joints of a human body, the system detects possibilities for motion within mechanical structures. It picks up subtle clues—symmetry, spacing, alignment—to infer articulation. These motion predictions are then transformed into simulation-ready outputs, enabling robots to grasp, push, rotate, and interact with objects in dynamic ways. It’s the moment when static geometry becomes interactive reality—where design becomes not just visual, but actionable and learnable for machines.

We’ve also architected the system to work natively with Text/Image-to-CAD input. When a user enters a phrase like “a lever with a rotating handle” or similar images, our solution not only generates separated parts but includes joint types and motion capabilities in the resulting articulated CAD model. This goes far beyond surface-level 3D generation—it's a full semantic-to-functional mapping pipeline. The system extracts functional intent from natural language and turns it into motion-aware design. That’s a foundational leap in design automation.

Thanks to this capability, a simple line of text can now produce a robot-interactive object—complete with articulation—without manual joint configuration. It’s a new paradigm that traditional CAD systems were never designed to imagine.