3.3 KiB
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Yilei Jiang1*, Yaozhi Zheng1*, Yuxuan Wan2*,
Jiaming Han1, Qunzhong Wang1,
Michael R. Lyu2, Xiangyu Yue1✉
1CUHK MMLab, 2CUHK ARISE Lab
*Equal contribution ✉Corresponding author
Demo Videos
A showcase of how ScreenCoder transforms UI screenshots into structured, editable HTML/CSS code using a modular multi-agent framework.
Youtube Page
https://github.com/user-attachments/assets/5d4c0808-76b8-4eb3-b333-79d0ac690189
Instagram Page
https://github.com/user-attachments/assets/9819d559-863e-4126-8506-1eccaa806df0
Design Draft(allow customized modifications!)
https://github.com/user-attachments/assets/d2f26583-4649-4b6d-8072-b11cd1025f4b
Project Structure
main.py: The main script to generate final HTML code for a single screenshot.UIED/: Contains the UIED (UI Element Detection) engine for analyzing screenshots and detecting components.run_single.py: Python script to run UI component detection on a single image.
html_generator.py: Takes the detected component data and generates a complete HTML layout with generated code for each module.image_replacer.py: A script to replace placeholder divs in the final HTML with actual cropped images.mapping.py: Maps the detected UIED components to logical page regions.requirements.txt: Lists all the necessary Python dependencies for the project.doubao_api.txt: API key file for the Doubao model (should be kept private and is included in.gitignore).
Setup and Installation
-
Clone the repository:
git clone https://github.com/JimmyZhengyz/screencoder.git cd screencoder -
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt -
Set up API Key:
- Create a file named
doubao_api.txtin the root directory. - Paste your Doubao API key into this file.
- Create a file named
Usage
The typical workflow is a multi-step process as follows:
-
Initial Generation with Placeholders: Run the Python script to generate the initial HTML code for a given screenshot.
- Block Detection:
python block_parsor.py - Generation with Placeholders (Gray Images Blocks):
python html_generator.py
- Block Detection:
-
Final HTML Code: Run the python script to generate final HTML code with copped images from the original screenshot.
- Placeholder Detection:
python image_box_detection.py - UI Element Detection:
python UIED/run_single.py - Mapping Alignment Between Placeholders and UI Elements:
python mapping.py - Placeholder Replacement:
python image_replacer.py
- Placeholder Detection:
-
Simple Run: Run the python script to generate the final HTML code:
python main.py
