One year later…

Last year was my first encounter with the robotic arm. I managed to understand what it is and how to set it up. Sounds superficial? Indeed!

This spring semester I decided to face the robotic arm again, with a bit more patience and a bit more courage. I felt confident that at least I could learn a bit more on the software side.

Goals:

The GRAND goal was to stack Jenga blocks with closed-loop visual control rather than scripted coordinates.

To achieve this, I gave myself some Sub-Goals:

  1. Using the Intel RealSense D435 to build a clean data collection setup.
    • Meaning that I can create a repeatable camera setup and capture images of Jenga blocks under different lighting, angles, backgrounds, and stack arrangements
  2. Using classical OpenCV to detect Jenga blocks in the image.
    • edge detection
    • contour detection
    • rectangle fitting
    • orientation estimation
  3. Estimate block position and orientation in real space
  4. Execute reliable pick actions
  5. Execute reliable place actions
  6. Add corrections and feedback

Outreach:

Cold emailing is sometimes very hot:
Response

Setting Up:

Configuration Based on the Intel RealSense D435 Camera

Hardware Requirements

Software

Supported Python Versions

Supported Python versions: 3.8–3.11 (recommended: 3.11)

1. Clone the repository

1
2
git clone https://github.com/xArm-Developer/ufactory_vision.git
cd ufactory_vision

2. Install Dependencies:

1
2
3
4
5
6
7
8
pip install \
numpy==1.24.4 \
torch==2.4.1 \
opencv-contrib-python==4.10.0.84 \
opencv-python==4.10.0.84 \
scikit-image==0.21.0 \
xarm-python-sdk==1.14.7 \
pyrealsense2==2.56.5.9235

Resources

  • UFACTORY Documentation — Official UFACTORY docs for manuals, APIs, support articles, and release notes.
  • xArm Python SDK — Official Python SDK for controlling xArm robots.
  • UFACTORY Vision — UFACTORY’s vision-related repository, useful for camera integration and vision-based robotic workflows.
  • GGCNN Kinova Grasping — Reference project for vision-guided robotic grasping using GG-CNN.

OpenCV resources