AI算法让图片动起来，特朗普和蒙娜丽莎深情合唱《Unravel》_小言_互联网的博客

AI算法让图片动起来，特朗普和蒙娜丽莎深情合唱《Unravel》

2021-01-10 07:35 700人阅读评论(0)

点赞再看，养成习惯，微信公众号搜索【JackCui-AI】关注这个爱发技术干货的程序员。本文 GitHub https://github.com/Jack-Cherish/PythonPark 已收录，有一线大厂面试完整考点、资料以及我的系列文章。

一、前言

让一张图片，动起来，应该怎么做？

DeepFake 一阶运动模型，让万物皆可动。

利用这项技术，用特朗普和蒙娜丽莎的图片，合唱一首《Unravel》，是什么效果？

今天，它来了！

让图片动起来，特朗普和蒙娜丽莎深情合唱《Unravel》

今天，继续手把手教学。

算法原理、环境搭建、效果实现，一条龙服务，尽在下文！

下文提到的代码、权重文件、视频图片素材，我都已经打包好了，拿来直接用也可以。

下载链接（密码:tl0h）：https://pan.baidu.com/s/1OEfsXWAN4RPO9vwbCTXIMA

更多有趣算法都放在了 Github，超多干货：

https://github.com/Jack-Cherish/PythonPark

二、算法原理

First Order Motion，也就是一阶运动模型，来自 NeurIPS 2019 论文。

「First Order Motion Model for Image Animation」

论文最初的目的是让「静态图片」动起来。如下图所示：“你动，它也动”。

这个模型可以轻易地让「权利的游戏」中的人物模仿特朗普进行讲话，还可以让静态的马跑起来等。

一阶运动模型的思想是用一组自学习的关键点和局部仿射变换来建立复杂运动模型。

模型由运动估计模块和图像生成模块两个主要部分组成。

首先进行关键点检测，然后根据关键点，进行运动估计，最后使用图像生成模块，生成最终效果。

在运动估计模块中，该模型通过自监督学习将目标物体的外观和运动信息进行分离，并进行特征表示。

而在图像生成模块中，模型会对目标运动期间出现的遮挡进行建模，然后从给定的图片中提取外观信息，结合先前获得的特征表示，生成图片。

作者使用该算法在四个数据集上进行了训练和测试。

VoxCeleb 数据集、UvA-Nemo 数据集、The BAIR robot pushing dataset、作者自己收集的数据集。

其中，VoxCeleb 是一个大型人声识别数据集。

它包含来自 YouTube 视频的 1251 位名人的约 10 万段语音，同时数据基本上是性别平衡的（男性占 55％），这些名人有不同的口音、职业和年龄。

First Order Motion 利用了这个数据集的视频图像，进行了模型训练。

我们就可以利用这个训练好的，人脸的运动估计模型，完成我们今天的任务。

「特朗普和蒙娜丽莎的深情合唱」。

除了需要用到这个一阶运动模型，还需要使用 OpenCV 和 ffmpeg 做视频、音频和图像的处理。

具体的实现，在下文的「效果实现」中说明。

三、环境搭建

效果实现上，我们可以直接用已有的库去实现我们想要的功能。

「Real Time Image Animation」

项目地址：https://github.com/anandpawara/Real_Time_Image_Animation

Python 为什么这么受欢迎，就是因为这一点。

有很多开源项目，方便我们快速实现自己想要的功能，极大降低了开发成本。

真是，谁用谁知道啊。

环境搭建，还是建议使用 Anaconda，安装一些必要的第三方库，可以参考这篇开发环境搭建的内容：

《Pytorch深度学习实战教程（一）：语义分割基础与环境搭建》

这个项目需要用到的第三方库，也都写的很全：

https://github.com/anandpawara/Real_Time_Image_Animation/blob/master/requirements.txt

直接使用 pip 安装即可：

Python

1	python -m pip install -r requirements.txt

此外，为了处理音频和视频，还需要配置 ffmpeg。

安装好 ffmpeg 并配置好环境变量即可。

ffmpeg 下载地址：https://ffmpeg.zeranoe.com/builds/

四、效果实现

实现也非常简单。

首先，整理一下思路：

「Real Time Image Animation」使用一阶运动模型，根据已有视频，让静态图动起来。

左图为原始图片，中间为生成结果，右侧为原始视频。

但是，这个项目只能处理图像，不能保留音频。

所以，我们需要先将音频保存，再将处理好的视频和音频进行合成。

这个功能，就用我们下载好的 ffmpeg 实现。

编写如下代码：


  
   
    
     
    
    
     
      import subprocess
     
    
   
    
     
    
    
     
      import os
     
    
   
    
     
    
    
     
      from PIL 
      import Image
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      def video2mp3(file_name):
     
    
   
    
     
    
    
         
      """
     
    
   
    
     
    
    
     
       将视频转为音频
     
    
   
    
     
    
    
     
       :param file_name: 传入视频文件的路径
     
    
   
    
     
    
    
     
       :return:
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
     
          outfile_name = file_name.split(
      '.')[
      0] + 
      '.mp3'
     
    
   
    
     
    
    
     
          cmd = 
      'ffmpeg -i ' + file_name + 
      ' -f mp3 ' + outfile_name
     
    
   
    
     
    
    
     
          subprocess.call(cmd, shell=
      True)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      def video_add_mp3(file_name, mp3_file):
     
    
   
    
     
    
    
         
      """
     
    
   
    
     
    
    
     
       视频添加音频
     
    
   
    
     
    
    
     
       :param file_name: 传入视频文件的路径
     
    
   
    
     
    
    
     
       :param mp3_file: 传入音频文件的路径
     
    
   
    
     
    
    
     
       :return:
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
     
          outfile_name = file_name.split(
      '.')[
      0] + 
      '-f.mp4'
     
    
   
    
     
    
    
     
          subprocess.call(
      'ffmpeg -i ' + file_name
     
    
   
    
     
    
    
     
                          + 
      ' -i ' + mp3_file + 
      ' -strict -2 -f mp4 '
     
    
   
    
     
    
    
     
                          + outfile_name, shell=
      True)

搞定，视频转音频，以及音频合成都搞定了。

我们需要对「Real Time Image Animation」这个项目进行修改，修改 image_animation.py 文件。


  
   
    
     
    
    
     
      import imageio
     
    
   
    
     
    
    
     
      import torch
     
    
   
    
     
    
    
     
      from tqdm 
      import tqdm
     
    
   
    
     
    
    
     
      from animate 
      import normalize_kp
     
    
   
    
     
    
    
     
      from demo 
      import load_checkpoints
     
    
   
    
     
    
    
     
      import numpy 
      as np
     
    
   
    
     
    
    
     
      import matplotlib.pyplot 
      as plt
     
    
   
    
     
    
    
     
      import matplotlib.animation 
      as animation
     
    
   
    
     
    
    
     
      from skimage 
      import img_as_ubyte
     
    
   
    
     
    
    
     
      from skimage.transform 
      import resize
     
    
   
    
     
    
    
     
      import cv2
     
    
   
    
     
    
    
     
      import os
     
    
   
    
     
    
    
     
      import argparse
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      import subprocess
     
    
   
    
     
    
    
     
      import os
     
    
   
    
     
    
    
     
      from PIL 
      import Image
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      def video2mp3(file_name):
     
    
   
    
     
    
    
         
      """
     
    
   
    
     
    
    
     
       将视频转为音频
     
    
   
    
     
    
    
     
       :param file_name: 传入视频文件的路径
     
    
   
    
     
    
    
     
       :return:
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
     
          outfile_name = file_name.split(
      '.')[
      0] + 
      '.mp3'
     
    
   
    
     
    
    
     
          cmd = 
      'ffmpeg -i ' + file_name + 
      ' -f mp3 ' + outfile_name
     
    
   
    
     
    
    
     
          print(cmd)
     
    
   
    
     
    
    
     
          subprocess.call(cmd, shell=
      True)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      def video_add_mp3(file_name, mp3_file):
     
    
   
    
     
    
    
         
      """
     
    
   
    
     
    
    
     
       视频添加音频
     
    
   
    
     
    
    
     
       :param file_name: 传入视频文件的路径
     
    
   
    
     
    
    
     
       :param mp3_file: 传入音频文件的路径
     
    
   
    
     
    
    
     
       :return:
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
     
          outfile_name = file_name.split(
      '.')[
      0] + 
      '-f.mp4'
     
    
   
    
     
    
    
     
          subprocess.call(
      'ffmpeg -i ' + file_name
     
    
   
    
     
    
    
     
                          + 
      ' -i ' + mp3_file + 
      ' -strict -2 -f mp4 '
     
    
   
    
     
    
    
     
                          + outfile_name, shell=
      True)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      ap = argparse.ArgumentParser()
     
    
   
    
     
    
    
     
      ap.add_argument(
      "-i", 
      "--input_image", required=
      True,help=
      "Path to image to animate")
     
    
   
    
     
    
    
     
      ap.add_argument(
      "-c", 
      "--checkpoint", required=
      True,help=
      "Path to checkpoint")
     
    
   
    
     
    
    
     
      ap.add_argument(
      "-v",
      "--input_video", required=
      False, help=
      "Path to video input")
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      args = vars(ap.parse_args())
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      print(
      "[INFO] loading source image and checkpoint...")
     
    
   
    
     
    
    
     
      source_path = args[
      'input_image']
     
    
   
    
     
    
    
     
      checkpoint_path = args[
      'checkpoint']
     
    
   
    
     
    
    
     
      if args[
      'input_video']:
     
    
   
    
     
    
    
     
          video_path = args[
      'input_video']
     
    
   
    
     
    
    
     
      else:
     
    
   
    
     
    
    
     
          video_path = 
      None
     
    
   
    
     
    
    
     
      source_image = imageio.imread(source_path)
     
    
   
    
     
    
    
     
      source_image = resize(source_image,(
      256,
      256))[..., :
      3]
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      generator, kp_detector = load_checkpoints(config_path=
      'config/vox-256.yaml', checkpoint_path=checkpoint_path)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      if 
      not os.path.exists(
      'output'):
     
    
   
    
     
    
    
     
          os.mkdir(
      'output')
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      relative=
      True
     
    
   
    
     
    
    
     
      adapt_movement_scale=
      True
     
    
   
    
     
    
    
     
      cpu=
      False
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      if video_path:
     
    
   
    
     
    
    
     
          cap = cv2.VideoCapture(video_path) 
     
    
   
    
     
    
    
     
          print(
      "[INFO] Loading video from the given path")
     
    
   
    
     
    
    
     
      else:
     
    
   
    
     
    
    
     
          cap = cv2.VideoCapture(
      0)
     
    
   
    
     
    
    
     
          print(
      "[INFO] Initializing front camera...")
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      fps = cap.get(cv2.CAP_PROP_FPS)
     
    
   
    
     
    
    
     
      size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      video2mp3(file_name = video_path)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      fourcc = cv2.VideoWriter_fourcc(
      'M',
      'P',
      'E',
      'G')
     
    
   
    
     
    
    
     
      #out1 = cv2.VideoWriter('output/test.avi', fourcc, fps, (256*3 , 256), True)
     
    
   
    
     
    
    
     
      out1 = cv2.VideoWriter(
      'output/test.mp4', fourcc, fps, size, 
      True)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      cv2_source = cv2.cvtColor(source_image.astype(
      'float32'),cv2.COLOR_BGR2RGB)
     
    
   
    
     
    
    
     
      with torch.no_grad() :
     
    
   
    
     
    
    
     
          predictions = []
     
    
   
    
     
    
    
     
          source = torch.tensor(source_image[np.newaxis].astype(np.float32)).permute(
      0, 
      3, 
      1, 
      2)
     
    
   
    
     
    
    
         
      if 
      not cpu:
     
    
   
    
     
    
    
     
              source = source.cuda()
     
    
   
    
     
    
    
     
          kp_source = kp_detector(source)
     
    
   
    
     
    
    
     
          count = 
      0
     
    
   
    
     
    
    
         
      while(
      True):
     
    
   
    
     
    
    
     
              ret, frame = cap.read()
     
    
   
    
     
    
    
     
              frame = cv2.flip(frame,
      1)
     
    
   
    
     
    
    
             
      if ret == 
      True:
     
    
   
    
     
    
    
                 
     
    
   
    
     
    
    
                 
      if 
      not video_path:
     
    
   
    
     
    
    
     
                      x = 
      143
     
    
   
    
     
    
    
     
                      y = 
      87
     
    
   
    
     
    
    
     
                      w = 
      322
     
    
   
    
     
    
    
     
                      h = 
      322 
     
    
   
    
     
    
    
     
                      frame = frame[y:y+h,x:x+w]
     
    
   
    
     
    
    
     
                  frame1 = resize(frame,(
      256,
      256))[..., :
      3]
     
    
   
    
     
    
    
                 
     
    
   
    
     
    
    
                 
      if count == 
      0:
     
    
   
    
     
    
    
     
                      source_image1 = frame1
     
    
   
    
     
    
    
     
                      source1 = torch.tensor(source_image1[np.newaxis].astype(np.float32)).permute(
      0, 
      3, 
      1, 
      2)
     
    
   
    
     
    
    
     
                      kp_driving_initial = kp_detector(source1)
     
    
   
    
     
    
    
                 
     
    
   
    
     
    
    
     
                  frame_test = torch.tensor(frame1[np.newaxis].astype(np.float32)).permute(
      0, 
      3, 
      1, 
      2)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
                  driving_frame = frame_test
     
    
   
    
     
    
    
                 
      if 
      not cpu:
     
    
   
    
     
    
    
     
                      driving_frame = driving_frame.cuda()
     
    
   
    
     
    
    
     
                  kp_driving = kp_detector(driving_frame)
     
    
   
    
     
    
    
     
                  kp_norm = normalize_kp(kp_source=kp_source,
     
    
   
    
     
    
    
     
                                      kp_driving=kp_driving,
     
    
   
    
     
    
    
     
                                      kp_driving_initial=kp_driving_initial, 
     
    
   
    
     
    
    
     
                                      use_relative_movement=relative,
     
    
   
    
     
    
    
     
                                      use_relative_jacobian=relative, 
     
    
   
    
     
    
    
     
                                      adapt_movement_scale=adapt_movement_scale)
     
    
   
    
     
    
    
     
                  out = generator(source, kp_source=kp_source, kp_driving=kp_norm)
     
    
   
    
     
    
    
     
                  predictions.append(np.transpose(out[
      'prediction'].data.cpu().numpy(), [
      0, 
      2, 
      3, 
      1])[
      0])
     
    
   
    
     
    
    
     
                  im = np.transpose(out[
      'prediction'].data.cpu().numpy(), [
      0, 
      2, 
      3, 
      1])[
      0]
     
    
   
    
     
    
    
     
                  im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
     
    
   
    
     
    
    
                 
      #joinedFrame = np.concatenate((cv2_source,im,frame1),axis=1)
     
    
   
    
     
    
    
                 
      #joinedFrame = np.concatenate((cv2_source,im,frame1),axis=1)
     
    
   
    
     
    
    
                 
     
    
   
    
     
    
    
                 
      #cv2.imshow('Test',joinedFrame)
     
    
   
    
     
    
    
                 
      #out1.write(img_as_ubyte(joinedFrame))
     
    
   
    
     
    
    
     
                  out1.write(img_as_ubyte(im))
     
    
   
    
     
    
    
     
                  count += 
      1
     
    
   
    
     
    
    
     
      # if cv2.waitKey(20) & 0xFF == ord('q'):
     
    
   
    
     
    
    
     
      # break
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
                 
      break
     
    
   
    
     
    
    
             
     
    
   
    
     
    
    
     
          cap.release()
     
    
   
    
     
    
    
     
          out1.release()
     
    
   
    
     
    
    
     
          cv2.destroyAllWindows()
     
    
   
    
     
    
    
         
     
    
   
    
     
    
    
     
      video_add_mp3(file_name=
      'output/test.mp4', mp3_file=video_path.split(
      '.')[
      0] + 
      '.mp3')

然后下载算法需要的权重文件和视频图片素材。

修改好的代码、权重文件、视频图片素材，我都已经打包好了，拿来直接用也可以。

下载链接（密码:tl0h）：https://pan.baidu.com/s/1OEfsXWAN4RPO9vwbCTXIMA

运行命令：

python image_animation.py -i path_to_input_file -c path_to_checkpoint -v path_to_video_file

path_to_input_file 是输入的模板图片

path_to_checkpoint 是权重文件路径

path_to_video_file 是输入的视频文件

如果使用我打包好的程序，可以使用如下指令直接运行，获得文章开头的视频：

python image_animation.py -i Inputs/trump2.png -c checkpoints/vox-cpk.pth.tar -v 1.mp4

最后生成的视频存放在 output 文件夹下。

大功告成！

五、最后

算法处理视频的速度很快，用 GPU 几秒钟就能搞定。

文章持续更新，可以微信公众号搜索【JackCui-AI】第一时间阅读，本文 GitHub https://github.com/Jack-Cherish/PythonPark 已经收录，有大厂面试完整考点，欢迎Star。

转载：https://blog.csdn.net/c406495762/article/details/108142140

查看评论

小言_互联网的博客

小言_互联网的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章