It’s all real with low-to-no-code tools

7 steps for AI-generated influencer creation — a complete hands-on tutorial

All together: Face — Video — Text— Voice

Alex Honchar
7 min readAug 16, 2024

--

Illustration by author

This will be a very short intro. When I saw Pieter Levels demo of an AI YouTuber I said to myself — I want to create one too! A couple of days after I got my results, and I want to share the whole process with you👇

Step-by-step tutorial

To see the results of this tutorial, you can check out the Instagram page of Jen or her TikTok — a virtual influencer teaching how to create virtual influencers with AI :) The bet I’d personally make is that the technology will only get better and better, as well as the results.

Jen can talk, walk, and even dance while explaining hands-on AI tools. Can your influencer team do the same and how fast? How much does it cost?

1. Character and situation

We will start with FLUX, which is the newest model for image generation and is similar to the more well-known Midjourney or Stability. The overall experience is very straightforward — write what you want to generate and play with the results:

There are many guides on how to write a perfect prompt, but the best thing you can do is to provide a reference image and ask AI to generate a similar image:

I used open videos of Spanish TikTok influencer https://www.tiktok.com/@arianehoyos

2. Character consistency

As you can see, the faces in all the photos are different, but what I want is Jen everywhere! There are two ways to deal with this issue. The first and the easiest is to apply a face swap.

I used the remaker’s face swap feature which has a decent amount of free credits

The second is more complex but could provide superior results — model fine-tuning. There is a service Photo AI where all you need is to upload 20–30 of your photos and it will create a model that can generate your person in all possible situations, but in our case, we don’t have 20 photos to start with, so face swap is our way to go! A similar feature has Runway, but the results are less realistic.

3. Video generation

Two major video generation products today are Kling AI and Runaway. We will focus on the first, and some analysis of the second you will find below. Again, the experience is very straightforward:

  1. Select the “Image to Video” option and upload your image
  2. Write a prompt (nothing crazy, in my case, it’s as simple as “a woman influencer slowly and calmly walks on the city street and tells a story”)
  3. Select professional mode and 10s generation length
  4. Press “Generate”! In most cases, the result actually will look good enough without tweaking
Kling AI interface and the inputs
How the video results will look like

4. Text generation

This is the easiest part — you either just write it yourself or ask your favorite chatbot for help. For more advanced scenarios, please check out my article about AI agents. In the case of Jen, I wrote the text pieces himself.

5. Audio generation

I’d say this is the second-easiest part. You just need to copy your text and paste it into the ElevenLabs window like below, then select a voice from a dropdown menu, press “Generate speech” and download the voice sample. We will use 2 speeches with the “Laura” voice to complete our influencer videos generated above (the vertical ones). There is also an opportunity for custom voice creation or even voice cloning.

ElevenLabs interface — extremely straightforward

6. Lips synchronization

It doesn’t come as a surprise, that there exists a tool to match the face with the speech. An easy and fast way to do it is with SyncLabs. Just upload your video and audio, and everything else will be done automatically! I left the specific lips sync model as default. The quality needs improvement, but I will leave it for the next article and for the work with our clients.

SyncLabs is very straightforward as other tools — I believe you won’t see any problems using it yourself

7. Editing, subtitles, etc

You can use any video editor of your choice, but I see that recently Captions and CapCut are very popular among bloggers, so I gave the first a try too. Nothing fancy, it automatically generates subtitles (which is a must for short talking videos) and allows some primitive editing as well.

You can find full videos with the final results on Jen’s Instagram account

Business case

All this geek stuff is cool, but where is the money?

Macro-analysis

Illustration from https://communicateonline.me/category/industry-insights/post-details/the-rise-of-virtual-influencers-to-disrupt-the-influencer-marketing-industry

AI influencers are not a new thing — for several years they have amassed millions of followers, earn more than dozens of thousands of USD monthly, and partner with brands like Prada, Dior, Calvin, BMV, etc. The industry reports also share crazy growth estimations: CAGR of 38.9% from 2023 to 2030 + Revenue forecast in 2030 of USD 45.82 billion. Also, curious to see more interest in virtual influencers in Asia Pacific and Latin America compared to North America and Europe.

Micro-analysis

I want to refer here to Pieter’s tweets [1, 2] that provided good but rather cherry-picked examples:

AI Influencer cost estimations by Pieter Levels

My 10 cents here will be about:

  • Actual cost — it includes additional tools (FLUX, 11Labs, SyncLabs) which all are cheap, but don’t forget the manual work needed to switch from tool to tool — no automation here yet! Nevertheless, there is an opportunity to create your own technology (something that we help with at Neurons Lab) and turn OPEX into CAPEX and eventually make it cheaper for you. Also, we can make a bet that this technology will become only cheaper.
  • Human labor baseline — I wouldn’t expect that most of the world, especially fast-growing Asian countries pay $150 per short influencer video as in Pieter’s example. But even if we make human labor 10x cheaper (aggressive, yes) and keep AI price at $5 per video as per Pieter’s estimation, the ROI is still 3x only at price.

Also, you should take into account the speed of AI content generation and scale that’s unlocked with AI influencers, the differences in ballpark estimations are becoming less and less relevant.

What’s next?

If you followed the tutorial or at least checked Jen’s Insta or her TikTok, you should’ve spotted the issues:

  • Consistency of the face swapping. The potential solution is model fine-tuning, but you need more photos.
  • Inability to put Jen in a real physical space. Simple experiments and background removal, copying Jen to some bar, and running Kling didn’t work.
  • Monotonic robotic speech and the absence of natural sounds. Potentially, it can be solved with deeper voice design in ElevenLabs and better video editing.
  • Lips sync is not great, at facial expressions, sync is absent at all. Need to use better technology here.
  • Automation. Currently, the whole process is manual and Kling API with its API absence will be the main blocker for the process automation.
  • What about safety and GenAI regulations? Open question :)

The work continues! Reach out to me if you want to experiment with AI influencer creation for your business as a service or if you’re interested in developing and owning GenAI technology as a core asset of your business — we will help!

--

--