Vision-Guided Robot Pick System

AI-Powered Object Detection + Collaborative Robot Arm Control

1. Project Overview

This project integrates a depth camera, YOLO object detection, hand-eye calibration, and a FAIRINO collaborative robot arm — controlled by natural language commands processed by a Gemini AI agent. A Unity 3D interface visualizes detected objects in real time and allows the user to issue commands in any language.

https://youtu.be/WER_G-ni5jg

more videos

source code - https://github.com/prof-lijar/vision-guided-robot-pick-system.git

references

2. System Architecture

graph TD
    A[Orbbec DaBai DCW\nDepth Camera] -->|/camera/color/image_raw\n/camera/depth/points| B[position_3d]
    B -->|/detections JSON| C[commander]
    B -->|/detections JSON| D[Unity\nVisualization]
    D -->|/ai_command| E[AI Agent\nGemini]
    E -->|/robot_command\n/robot_speed| C
    D -->|/robot_command| C
    C -->|TCP/IP SDK| F[FAIRINO\nRobot Arm]
    E -->|/ai_reply| D

3. Hardware Setup

Component	Spec
Robot	FAIRINO Collaborative Arm
Camera	Orbbec DaBai DCW (RGB-D)
Host PC	Ubuntu 22.04 / WSL2, ROS2 Humble
Visualization	Unity 3D (ROS-TCP-Connector)

4. Software Layers

graph LR
    L1[Layer 1\nCamera Check] --> L2[Layer 2\nYOLO 2D Detect]
    L2 --> L3[Layer 3\n3D Position]
    L3 --> L4[Layer 4\nCalibration]
    L4 --> L5[Layer 5\nCalib Test]
    L5 --> L6[Layer 6\nCommander]
    L6 --> L7[Layer 7\nAI Agent]

Layer	File	Role
1	`layer1_camera_check.py`	Verify camera stream
2	`layer2_detector_2d.py`	YOLO object detection
3	`layer3_position_3d.py`	3D position + calibration transform
4	`layer4_calibration.py`	Hand-eye calibration (SVD)
5	`layer5_calib_test.py`	Calibration verification
6	`layer6_commander.py`	Robot motion execution
7	`layer7_ai_agent.py`	Natural language AI agent