What is Step1X-Edit?

Step1X-Edit is a multimodal image editing model that uses a large language model and diffusion decoder to edit images based on user instructions. It aims to match the performance of advanced closed-source models like GPT-4o and Gemini 1.5 Flash.

Step1X-EditSource: https://step1x-edit.github.io/

Overview of Step1X-Edit

FeatureDescription
AI ToolStep1X-Edit
CategoryMultimodal Image Editing Model
Main FunctionGeneral-purpose image editing with natural language instructions
Model TypeLarge Language Model with Diffusion Decoder
Open SourceYes (see github.com/stepfun-ai/Step1X-Edit)
Research Paperarxiv.org/pdf/2504.17761
Official Model Demostep1x-edit.github.io

Key Features of Step1X-Edit

  • Unified Image Editing Model

    Step1X-Edit is a single model that can handle many types of image editing tasks, making it easy to use for a wide range of real-world needs.

  • Works with User Instructions

    The model understands and follows genuine user instructions, so you can edit images just by describing what you want to change.

  • Multimodal Large Language Model (LLM)

    Step1X-Edit uses a powerful AI that can process both images and text instructions together, allowing for more accurate and flexible edits.

  • Efficient and Fast Processing

    The model is optimized for speed and memory usage, supporting different hardware setups and offering options to save GPU memory.

  • High-Quality Results

    Step1X-Edit is trained on a large, high-quality dataset and is evaluated using a special benchmark based on real user requests, ensuring reliable and impressive results.

  • Open and Accessible

    The model and its code are available on GitHub and ModelScope, making it easy for anyone to try, use, or build upon.

Examples of Step1X-Edit in Action

1. Add a Necklace to a Person

Automatically add a realistic necklace to a person in the image using a simple text prompt.

Source: https://step1x-edit.github.io/

2. Add an Object in the Image

Insert new objects into an image seamlessly, such as adding a cup to a table or a car to a street scene.

Source: https://step1x-edit.github.io/

3. Add Wings to a Character

Transform a person or animal by adding wings, demonstrating creative and precise compositional editing.

Source: https://step1x-edit.github.io/

4. Beautify the Man

Enhance facial features and overall appearance with a single instruction, showing Step1X-Edit's ability to perform subtle, high-quality retouching.

Source: https://step1x-edit.github.io/

5. Change Text in an Image

Replace or edit text within an image, such as changing a sign or label, while preserving the original style and context.

Source: https://step1x-edit.github.io/

6. Make the Colors Brighter

Enhance the vibrancy and brightness of colors in an image, perfect for making photos more eye-catching.

Source: https://step1x-edit.github.io/

7. Remove Text from an Image

Erase unwanted text from images, such as watermarks or captions, with seamless background restoration.

Source: https://step1x-edit.github.io/

8. Remove Wings from a Character

Remove added elements, such as wings, to restore the original appearance of a subject in the image.

Source: https://step1x-edit.github.io/

How to Use Step1X-Edit AI on HuggingFace?

Step 1: Open the Huggingface Space

Loading Step1X-Edit...

Please wait while we prepare the interface

Step 2: Upload Your Image

Click the upload area to select and upload the image you want to edit.

Step 3: Enter Your Edit Prompt

In the prompt box, type a description of the edit you want to make (for example, "remove the background" or "change the sky to sunset").

Step 4: Click Run

Press the Run button to start the editing process. The model will process your image and apply the requested changes.

Step 5: Download the Result

Once the edit is complete, preview and download your edited image directly from the interface.

Pros and Cons

Pros

  • Unified editing model
  • Multimodal instruction support
  • Open-source available
  • High-quality results
  • Flexible deployment options

Cons

  • High GPU requirement
  • Complex installation steps

How to Install Step1X-Edit Locally?

  1. Step 1: Clone the Repository

    git clone https://github.com/stepfun-ai/Step1X-Edit.git
    cd Step1X-Edit
  2. Step 2: Set Up Python Environment

    Make sure you’re using Python 3.10 or higher. Creating a virtual environment is recommended:

    python3.10 -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  3. Step 3: Install PyTorch and TorchVision

    Install PyTorch with CUDA (recommended: CUDA 12.1) according to your system and GPU. Example for CUDA 12.1:

    pip install torch==2.3.1 torchvision --index-url https://download.pytorch.org/whl/cu121
  4. Step 4: Install Required Python Packages

    pip install -r requirements.txt
  5. Step 5: Install FlashAttention (Required for Speed-Up)

    Run the provided helper script to find the correct FlashAttention wheel:

    python scripts/get_flash_attn.py

    The script will generate a compatible .whl filename for your system (e.g., flash_attn-2.7.2...whl). Find and download it from the FlashAttention Release Page, then install:

    pip install /path/to/flash_attn-*.whl
  6. Step 6: Download Model Weights

    Download weights for Step1X-Edit or Step1X-Edit-FP8 from ModelScope: Step1X-Edit on ModelScope.
    Place them in the appropriate folder as indicated in the repository instructions.

  7. Step 7: Run Example Inference

    Use the default script to run image editing examples:

    bash scripts/run_examples.sh

Step1X-Edit AI FAQs