2025-07-29 11:23:27 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:10:58 +08:00
2025-07-28 22:20:38 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-29 11:21:37 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 22:51:05 +08:00
2025-07-28 22:48:57 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00
2025-07-28 18:54:41 +08:00

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Yilei Jiang1*, Yaozhi Zheng1*, Yuxuan Wan2*, Jiaming Han1, Qunzhong Wang1,
Michael R. Lyu2, Xiangyu Yue1✉

1CUHK MMLab, 2CUHK ARISE Lab

*Equal contributionCorresponding author

 

Demo Videos

A showcase of how ScreenCoder transforms UI screenshots into structured, editable HTML/CSS code using a modular multi-agent framework.

Youtube Page

https://github.com/user-attachments/assets/5d4c0808-76b8-4eb3-b333-79d0ac690189

Instagram Page

https://github.com/user-attachments/assets/9819d559-863e-4126-8506-1eccaa806df0

Design Draftallow customized modifications!

https://github.com/user-attachments/assets/d2f26583-4649-4b6d-8072-b11cd1025f4b

Project Structure

  • main.py: The main script to generate final HTML code for a single screenshot.
  • UIED/: Contains the UIED (UI Element Detection) engine for analyzing screenshots and detecting components.
    • run_single.py: Python script to run UI component detection on a single image.
  • html_generator.py: Takes the detected component data and generates a complete HTML layout with generated code for each module.
  • image_replacer.py: A script to replace placeholder divs in the final HTML with actual cropped images.
  • mapping.py: Maps the detected UIED components to logical page regions.
  • requirements.txt: Lists all the necessary Python dependencies for the project.
  • doubao_api.txt: API key file for the Doubao model (should be kept private and is included in .gitignore).

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/JimmyZhengyz/screencoder.git
    cd screencoder
    
  2. Create a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Set up API Key:

    • Create a file named doubao_api.txt in the root directory.
    • Paste your Doubao API key into this file.

Usage

The typical workflow is a multi-step process as follows:

  1. Initial Generation with Placeholders: Run the Python script to generate the initial HTML code for a given screenshot.

    • Block Detection:
      python block_parsor.py
      
    • Generation with Placeholders (Gray Images Blocks):
      python html_generator.py
      
  2. Final HTML Code: Run the python script to generate final HTML code with copped images from the original screenshot.

    • Placeholder Detection:
      python image_box_detection.py
      
    • UI Element Detection:
      python UIED/run_single.py
      
    • Mapping Alignment Between Placeholders and UI Elements:
      python mapping.py
      
    • Placeholder Replacement:
      python image_replacer.py
      
  3. Simple Run: Run the python script to generate the final HTML code:

    python main.py
    

Acknowledgements

This project builds upon several outstanding open-source efforts. We would like to thank the authors and contributors of the following projects: UIED, DCGen, Design2Code

Description
No description provided
Readme Apache-2.0 156 MiB
Languages
Python 91.2%
Jupyter Notebook 4.3%
Shell 2.5%
HTML 1.3%
Dockerfile 0.6%