What is Step1X-Edit?
Step1X-Edit is a multimodal image editing model that uses a large language model and diffusion decoder to edit images based on user instructions. It aims to match the performance of advanced closed-source models like GPT-4o and Gemini 1.5 Flash.

Overview of Step1X-Edit
Feature | Description |
---|---|
AI Tool | Step1X-Edit |
Category | Multimodal Image Editing Model |
Main Function | General-purpose image editing with natural language instructions |
Model Type | Large Language Model with Diffusion Decoder |
Open Source | Yes (see github.com/stepfun-ai/Step1X-Edit) |
Research Paper | arxiv.org/pdf/2504.17761 |
Official Model Demo | step1x-edit.github.io |
Key Features of Step1X-Edit
Unified Image Editing Model
Step1X-Edit is a single model that can handle many types of image editing tasks, making it easy to use for a wide range of real-world needs.
Works with User Instructions
The model understands and follows genuine user instructions, so you can edit images just by describing what you want to change.
Multimodal Large Language Model (LLM)
Step1X-Edit uses a powerful AI that can process both images and text instructions together, allowing for more accurate and flexible edits.
Efficient and Fast Processing
The model is optimized for speed and memory usage, supporting different hardware setups and offering options to save GPU memory.
High-Quality Results
Step1X-Edit is trained on a large, high-quality dataset and is evaluated using a special benchmark based on real user requests, ensuring reliable and impressive results.
Open and Accessible
The model and its code are available on GitHub and ModelScope, making it easy for anyone to try, use, or build upon.
Examples of Step1X-Edit in Action
1. Add a Necklace to a Person
Automatically add a realistic necklace to a person in the image using a simple text prompt.
2. Add an Object in the Image
Insert new objects into an image seamlessly, such as adding a cup to a table or a car to a street scene.
3. Add Wings to a Character
Transform a person or animal by adding wings, demonstrating creative and precise compositional editing.
4. Beautify the Man
Enhance facial features and overall appearance with a single instruction, showing Step1X-Edit's ability to perform subtle, high-quality retouching.
5. Change Text in an Image
Replace or edit text within an image, such as changing a sign or label, while preserving the original style and context.
6. Make the Colors Brighter
Enhance the vibrancy and brightness of colors in an image, perfect for making photos more eye-catching.
7. Remove Text from an Image
Erase unwanted text from images, such as watermarks or captions, with seamless background restoration.
8. Remove Wings from a Character
Remove added elements, such as wings, to restore the original appearance of a subject in the image.
How to Use Step1X-Edit AI on HuggingFace?
Step 1: Open the Huggingface Space
Loading Step1X-Edit...
Please wait while we prepare the interface
Step 2: Upload Your Image
Click the upload area to select and upload the image you want to edit.
Step 3: Enter Your Edit Prompt
In the prompt box, type a description of the edit you want to make (for example, "remove the background" or "change the sky to sunset").
Step 4: Click Run
Press the Run button to start the editing process. The model will process your image and apply the requested changes.
Step 5: Download the Result
Once the edit is complete, preview and download your edited image directly from the interface.
Pros and Cons
Pros
- Unified editing model
- Multimodal instruction support
- Open-source available
- High-quality results
- Flexible deployment options
Cons
- High GPU requirement
- Complex installation steps
How to Install Step1X-Edit Locally?
Step 1: Clone the Repository
git clone https://github.com/stepfun-ai/Step1X-Edit.git cd Step1X-Edit
Step 2: Set Up Python Environment
Make sure you’re using Python 3.10 or higher. Creating a virtual environment is recommended:
python3.10 -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
Step 3: Install PyTorch and TorchVision
Install PyTorch with CUDA (recommended: CUDA 12.1) according to your system and GPU. Example for CUDA 12.1:
pip install torch==2.3.1 torchvision --index-url https://download.pytorch.org/whl/cu121
Step 4: Install Required Python Packages
pip install -r requirements.txt
Step 5: Install FlashAttention (Required for Speed-Up)
Run the provided helper script to find the correct FlashAttention wheel:
python scripts/get_flash_attn.py
The script will generate a compatible
.whl
filename for your system (e.g.,flash_attn-2.7.2...whl
). Find and download it from the FlashAttention Release Page, then install:pip install /path/to/flash_attn-*.whl
Step 6: Download Model Weights
Download weights for Step1X-Edit or Step1X-Edit-FP8 from ModelScope: Step1X-Edit on ModelScope.
Place them in the appropriate folder as indicated in the repository instructions.Step 7: Run Example Inference
Use the default script to run image editing examples:
bash scripts/run_examples.sh