ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Yilei Jiang^1*, Yaozhi Zheng^1*, Yuxuan Wan^2*,
Jiaming Han¹, Qunzhong Wang¹,
Michael R. Lyu², Xiangyu Yue^1✉

¹CUHK MMLab, ²CUHK ARISE Lab

^*Equal contribution ^✉Corresponding author

Demo Videos

A showcase of how ScreenCoder transforms UI screenshots into structured, editable HTML/CSS code using a modular multi-agent framework.

Project Structure

main.py: The main script to generate final HTML code for a single screenshot.
UIED/: Contains the UIED (UI Element Detection) engine for analyzing screenshots and detecting components.
- run_single.py: Python script to run UI component detection on a single image.
html_generator.py: Takes the detected component data and generates a complete HTML layout with generated code for each module.
image_replacer.py: A script to replace placeholder divs in the final HTML with actual cropped images.
mapping.py: Maps the detected UIED components to logical page regions.
requirements.txt: Lists all the necessary Python dependencies for the project.
doubao_api.txt: API key file for the Doubao model (should be kept private and is included in .gitignore).

Setup and Installation

Clone the repository:

git clone https://github.com/JimmyZhengyz/screencoder.git
cd screencoder

Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Set up API Key:
- Create a file named doubao_api.txt in the root directory.
- Paste your Doubao API key into this file.

Usage

The typical workflow is a multi-step process as follows:

Initial Generation with Placeholders: Run the Python script to generate the initial HTML code for a given screenshot.
- Block Detection:
```
python block_parsor.py
```
- Generation with Placeholders (Gray Images Blocks):
```
python html_generator.py
```
Final HTML Code: Run the python script to generate final HTML code with copped images from the original screenshot.
- Placeholder Detection:
```
python image_box_detection.py
```
- UI Element Detection:
```
python UIED/run_single.py
```
- Mapping Alignment Between Placeholders and UI Elements:
```
python mapping.py
```
- Placeholder Replacement:
```
python image_replacer.py
```
Simple Run: Run the python script to generate the final HTML code:
```
python main.py
```

3.3 KiB Raw Blame History Unescape Escape

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Demo Videos

Youtube Page

Instagram Page

Design Draft（allow customized modifications!）

Project Structure

Setup and Installation

Usage

3.3 KiB

Raw Blame History