4.0 KiB
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Yilei Jiang1*, Yaozhi Zheng1*, Yuxuan Wan2*, Jiaming Han1, Qunzhong Wang1,
Michael R. Lyu2, Xiangyu Yue1✉
1CUHK MMLab, 2CUHK ARISE Lab
*Equal contribution ✉Corresponding author
Demo Videos
A showcase of how ScreenCoder transforms UI screenshots into structured, editable HTML/CSS code using a modular multi-agent framework.
Youtube Page
https://github.com/user-attachments/assets/5d4c0808-76b8-4eb3-b333-79d0ac690189
Instagram Page
https://github.com/user-attachments/assets/9819d559-863e-4126-8506-1eccaa806df0
Design Draft(allow customized modifications!)
https://github.com/user-attachments/assets/d2f26583-4649-4b6d-8072-b11cd1025f4b
Qualitative Comparisons
We present qualitative examples to illustrate the improvements achieved by our method over existing approaches. The examples below compare the output of a baseline method with ours on the same input.
Baseline or Other Method
Our Method
As shown above, our method produces results that are more accurate, visually aligned, and semantically faithful to the original design.
Project Structure
main.py: The main script to generate final HTML code for a single screenshot.UIED/: Contains the UIED (UI Element Detection) engine for analyzing screenshots and detecting components.run_single.py: Python script to run UI component detection on a single image.
html_generator.py: Takes the detected component data and generates a complete HTML layout with generated code for each module.image_replacer.py: A script to replace placeholder divs in the final HTML with actual cropped images.mapping.py: Maps the detected UIED components to logical page regions.requirements.txt: Lists all the necessary Python dependencies for the project.doubao_api.txt: API key file for the Doubao model (should be kept private and is included in.gitignore).
Setup and Installation
-
Clone the repository:
git clone https://github.com/JimmyZhengyz/screencoder.git cd screencoder -
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt -
Set up API Key:
- Create a file named
doubao_api.txtin the root directory. - Paste your Doubao API key into this file.
- Create a file named
Usage
The typical workflow is a multi-step process as follows:
-
Initial Generation with Placeholders: Run the Python script to generate the initial HTML code for a given screenshot.
- Block Detection:
python block_parsor.py - Generation with Placeholders (Gray Images Blocks):
python html_generator.py
- Block Detection:
-
Final HTML Code: Run the python script to generate final HTML code with copped images from the original screenshot.
- Placeholder Detection:
python image_box_detection.py - UI Element Detection:
python UIED/run_single.py - Mapping Alignment Between Placeholders and UI Elements:
python mapping.py - Placeholder Replacement:
python image_replacer.py
- Placeholder Detection:
-
Simple Run: Run the python script to generate the final HTML code:
python main.py
Acknowledgements
This project builds upon several outstanding open-source efforts. We would like to thank the authors and contributors of the following projects: UIED, DCGen, Design2Code


