Yann LeCun sent a positive message: "Tencent's portrait photo generation can be played with casually."

CN
巴比特
Follow
1 year ago

Source: Synced

Image Source: Generated by Wujie AI

AI helps you become a versatile star.

This time, Yann LeCun has made his first appearance as a "versatile celebrity." Wearing Iron Man's suit, cool sunglasses, and expressionlessly gazing at you, he also poses in ancient costume in front of the Forbidden City…

Even LeCun himself has come forward to repost and shout out, "The painting from the Renaissance period in the lower left corner is my favorite."

The sexy goddess Black Widow, dressed in a purple wizard's robe, gazes into the distance, and can also wear a Christmas hat and make eye contact with you:

Ultraman in a spacesuit looks cute, and there's no sense of violation even with red hair.

The above research comes from institutions such as Nankai University and Tencent, proposing PhotoMaker, an efficient method for personalized text-to-image generation. The related paper "PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding" was released last December, and the project has just been open-sourced. In less than a day, it has received over 650 stars.

Project link: https://github.com/TencentARC/PhotoMaker?continueFlag=98363d6ac1beafe515190e50d2c40427

In addition to generating realistic portraits, PhotoMaker can also generate other styles, such as sketches, cartoons, and animations.

Different character identities can also be mixed to create a completely new character. The combination of Hepburn and Princess Elsa takes into account the characteristics of both:

Changing the age and gender of the photo subjects is also possible: I wonder how LeCun feels about this cross-dressing.

Image source: https://twitter.com/xiaohuggg/status/1746861416743928103

This research is available for everyone to try, and the operation is very simple, divided into 4 steps:

  • First, upload a picture, one is enough, but the effect will be better with multiple pictures. The faces in the uploaded images should occupy most of the image.
  • The second step is to input text prompts, ensuring the use of trigger words such as "man img," "woman img," or "girl img" during the operation.
  • The third step is to select a favorite style template (there are more than ten built-in).
  • The final step is to click the Submit button and wait for the generation.

If there are any improper operations during the above generation process, PhotoMaker will provide prompts, so there's no need to worry about making mistakes.

During the experience, we input a photo of Musk, with the prompt "A man img wearing a spacesuit," and the style in the form of a cartoon. After waiting a few seconds, the effect looks pretty good.

Try it out here: https://huggingface.co/spaces/TencentARC/PhotoMaker?continueFlag=98363d6ac1beafe515190e50d2c40427

Let's continue to explore the technologies behind this research.

Research Introduction

Paper link: https://arxiv.org/pdf/2312.04461.pdf

PhotoMaker, an efficient method for personalized text-to-image generation, mainly encodes any number of input ID images into a stacked ID embedding to preserve ID information. This embedding serves as a unified ID representation, not only encapsulating the features of the same input ID comprehensively, but also adapting to the features of different IDs for subsequent integration. This paves the way for more interesting and practical applications.

As shown in Figure 1, PhotoMaker can not only perform common reconstructions, but also change the attributes of the input portrait (e.g., accessories and expressions), generate human photos from different perspectives based on the input ID, and even modify the gender and age of the input ID (see Figure 1).

PhotoMaker also provides many possibilities for generating customized portraits for users. Although the images used to build the stacked ID embedding during training come from the same ID, different ID images can be used to form the stacked ID embedding during inference to merge and create new customized IDs. The merged new ID can retain the features of different input IDs. For example, PhotoMaker can generate a Scarlett that looks like Musk, or create a customized ID that blends someone with a well-known IP character, as shown in Figure 1(c).

To drive the training of PhotoMaker, researchers proposed an ID-oriented data construction pipeline to assemble the training data. With the help of the dataset constructed through the proposed pipeline, PhotoMaker demonstrates better ID preservation capabilities than the baseline method with fine-tuning during testing, while also providing significant speed improvements, high-quality generated results, strong generalization capabilities, and a wide range of applications. Figure 2 (a) provides an overview of PhotoMaker, and Figure 2 (b) shows the related data construction pipeline.

As shown in Figure 3 and Table 1, in qualitative and quantitative experiments, PhotoMaker can effectively meet the ability to generate high-quality images while ensuring high fidelity to the ID.

PhotoMaker can also bring characters from the last century or even ancient times into the present day and "take photos" of them, as shown in Figure 4 (a). Compared to PhotoMaker, both Dreambooth and SDXL struggle to generate realistic character images that have not appeared in reality. Additionally, due to DreamBooth's excessive reliance on the quality and resolution of custom images, it is difficult for DreamBooth to generate high-quality results when using old photos for custom generation.

If the user inputs images of different IDs, PhotoMaker can effectively integrate the features of different IDs to form a new ID. As seen in Figure 5, DreamBooth and SDXL cannot achieve identity blending. In contrast, regardless of whether the input is an anime IP or a real person, and regardless of gender, PhotoMaker can effectively preserve the features of different IDs in the generated new ID.

Furthermore, PhotoMaker's stylized performance is also excellent. As shown in Figure 6, PhotoMaker not only maintains good ID fidelity but also effectively demonstrates the style requirements in the prompt.

For more detailed technical content, please refer to the original paper.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink