AI-Powered Object Detection + Collaborative Robot Arm Control
This project integrates a depth camera, YOLO object detection, hand-eye calibration, and a FAIRINO collaborative robot arm — controlled by natural language commands processed by a Gemini AI agent. A Unity 3D interface visualizes detected objects in real time and allows the user to issue commands in any language.
more videossource code - https://github.com/prof-lijar/vision-guided-robot-pick-system.git
referencesgraph TD
A[Orbbec DaBai DCW\nDepth Camera] -->|/camera/color/image_raw\n/camera/depth/points| B[position_3d]
B -->|/detections JSON| C[commander]
B -->|/detections JSON| D[Unity\nVisualization]
D -->|/ai_command| E[AI Agent\nGemini]
E -->|/robot_command\n/robot_speed| C
D -->|/robot_command| C
C -->|TCP/IP SDK| F[FAIRINO\nRobot Arm]
E -->|/ai_reply| D
| Component | Spec |
|---|---|
| Robot | FAIRINO Collaborative Arm |
| Camera | Orbbec DaBai DCW (RGB-D) |
| Host PC | Ubuntu 22.04 / WSL2, ROS2 Humble |
| Visualization | Unity 3D (ROS-TCP-Connector) |
graph LR
L1[Layer 1\nCamera Check] --> L2[Layer 2\nYOLO 2D Detect]
L2 --> L3[Layer 3\n3D Position]
L3 --> L4[Layer 4\nCalibration]
L4 --> L5[Layer 5\nCalib Test]
L5 --> L6[Layer 6\nCommander]
L6 --> L7[Layer 7\nAI Agent]
| Layer | File | Role |
|---|---|---|
| 1 | layer1_camera_check.py |
Verify camera stream |
| 2 | layer2_detector_2d.py |
YOLO object detection |
| 3 | layer3_position_3d.py |
3D position + calibration transform |
| 4 | layer4_calibration.py |
Hand-eye calibration (SVD) |
| 5 | layer5_calib_test.py |
Calibration verification |
| 6 | layer6_commander.py |
Robot motion execution |
| 7 | layer7_ai_agent.py |
Natural language AI agent |