Vision-Guided Robot Pick System

AI-Powered Object Detection + Collaborative Robot Arm Control


1. Project Overview

This project integrates a depth camera, YOLO object detection, hand-eye calibration, and a FAIRINO collaborative robot arm — controlled by natural language commands processed by a Gemini AI agent. A Unity 3D interface visualizes detected objects in real time and allows the user to issue commands in any language.


https://youtu.be/WER_G-ni5jg

source code - https://github.com/prof-lijar/vision-guided-robot-pick-system.git

2. System Architecture

graph TD
    A[Orbbec DaBai DCW\nDepth Camera] -->|/camera/color/image_raw\n/camera/depth/points| B[position_3d]
    B -->|/detections JSON| C[commander]
    B -->|/detections JSON| D[Unity\nVisualization]
    D -->|/ai_command| E[AI Agent\nGemini]
    E -->|/robot_command\n/robot_speed| C
    D -->|/robot_command| C
    C -->|TCP/IP SDK| F[FAIRINO\nRobot Arm]
    E -->|/ai_reply| D

3. Hardware Setup

Component Spec
Robot FAIRINO Collaborative Arm
Camera Orbbec DaBai DCW (RGB-D)
Host PC Ubuntu 22.04 / WSL2, ROS2 Humble
Visualization Unity 3D (ROS-TCP-Connector)

4. Software Layers

graph LR
    L1[Layer 1\nCamera Check] --> L2[Layer 2\nYOLO 2D Detect]
    L2 --> L3[Layer 3\n3D Position]
    L3 --> L4[Layer 4\nCalibration]
    L4 --> L5[Layer 5\nCalib Test]
    L5 --> L6[Layer 6\nCommander]
    L6 --> L7[Layer 7\nAI Agent]
Layer File Role
1 layer1_camera_check.py Verify camera stream
2 layer2_detector_2d.py YOLO object detection
3 layer3_position_3d.py 3D position + calibration transform
4 layer4_calibration.py Hand-eye calibration (SVD)
5 layer5_calib_test.py Calibration verification
6 layer6_commander.py Robot motion execution
7 layer7_ai_agent.py Natural language AI agent