Extract Subtitles From TV Shows

Extract Subtitles From the Videos
- Things to Improve
More in Mind
Footnotes

Round Table Party

In the past 3 years, I really enjoyed watching a Chinese talk show called Round Table Party (I couldn’t find any official English name in the show, this name is used a few in other sites, so I am not sure this is correct. It maybe called Table π instead.), which was made by a famous Chinese TV show host Dou Wentao.

Dou Wentao was famous for a long-lasting weekly TV show called Behind The Headlines With Wen Tao, in which he would invite another two culture celebrities (usually one acquaintance to the show, and one expert in the topic domain, or sometimes two acquaintances. You can check the full list at its Douban’s page^[1].) to talk about recent popular headlines. The show was so casual and always conducted discussions from a civilian’s point of view, I felt it was so fresh as all other TV shows were always using bureaucratic tone and talked about things from the government’s point of view. And for me, it was a great companion during dinners for a very long time, as I felt I was kept up with the treads by listening to the show. Unfortunately, the show was shut down at September 2017 after running for almost 20 years^[2], though it was not that unexpected as the opinions in the show were sometimes so different from the mainstream in China, and Wentao was trying to have more diversity of perspectives in the show, one concrete example is that he invited both sides of an Internet quarrel (Xu Xiaodong and Wang Zhanhai) to the show separately, and letted them present their own perspectives. It was still a pity that the show was not continued.

However, Dou Wentao was able to continue the show spiritually in some other shows he produced. Round Table Party is one of these. In the show, he will invite three other guests, instead of two, which of course include some of the old friends from Behind The Headlines, and talk about interesting topics such as different kinds of relationships in China, instead of popular headlines, especially the political ones.

As a lot of names (person names, book names, location names, and etc) are referred in the show, and it’s hard to search the contents in videos, it would be handy if I can have transcripts of these shows which I can read and search easily.

Extract Subtitles From the Videos

From below screenshot, you can see that it should be not hard to extract the subtitles from the video as the Chinese subtitle is already embedded in the video.

Round Table Party screenshot

Below is the code I have hacked to do the work:

# -*- coding: utf-8 -*-

from PIL import Image
from PIL import ImageFilter
import cv2
import matplotlib.pyplot as plt
import numpy as np
import operator
import PIL.ImageOps
import pytesseract
import skvideo.io
import sys

# left_padding
x = 227
# top_padding
y = 617
# window_width
w = 710
# window_height
h = 58

videopath = sys.argv[1]

FRAMES = 15

dir = "frames/"


def read_video(video_file):
  inputparameters = {}
  outputparameters = {}
  reader = skvideo.io.FFmpegReader(video_file,
                                   inputdict=inputparameters,
                                   outputdict=outputparameters)
  video_shape = reader.getShape()
  (num_frames, h, w, c) = video_shape
  print(num_frames, h, w, c)
  return reader


def same(image1, image2):
  diff = 0
  for i in range(w):
    for j in range(h):
      (r1, g1, b1) = image1[i, j]
      (r2, g2, b2) = image2[i, j]
      if r1 == r2 and g1 == g2 and b1 == b2:
        continue
      else:
        diff += 1
  return diff < (w * h / 100)

previous = None


def process_image(count, im):
  global previous

  im_pil = Image.fromarray(im)

  inverted_im = PIL.ImageOps.invert(im_pil)
  croped_im = inverted_im.crop((x, y, x + w, y + h))

  # croped_im = croped_im.filter(ImageFilter.BLUR)
  # croped_im.show()

  pixels = croped_im.load()
  for i in range(w):
    for j in range(h):
      (r, g, b) = pixels[i, j]
      if r < 30 and g < 30 and b < 30:
        pixels[i, j] = (0, 0, 0)
      else:
        pixels[i, j] = (255, 255, 255)

  croped_im_cv2 = cv2.cvtColor(np.array(croped_im), cv2.COLOR_RGB2BGR)
  # croped_im_cv2 = cv2.medianBlur(croped_im_cv2, 3)

  # name = "frame_" + str(count) + ".jpg"
  # cv2.imwrite(dir + "/" + name, croped_im_cv2)

  if (previous is not None) and same(previous, pixels):
    return None
  else:
    previous = pixels

  return croped_im

reader = read_video(videopath)

import codecs

f = codecs.open("result.txt", "w", "utf-8")

previous_word = None
i = 0
for frame in reader.nextFrame():
  i += 1
  if i % FRAMES is not 0:
    continue
  else:
    # luv = cv2.cvtColor(frame, cv2.COLOR_BGR2LUV)
    im = process_image(i, frame)
    if im is not None:
      text = pytesseract.image_to_string(im, lang='chi_sim')

      if previous_word != None and text == previous_word:
        continue
      else:
        previous_word = text

      print(i, text)
      f.write(text + "\n")

f.close()

The skeleton of the program comes from shawnsky/extract-subtitles.

The output is good enough for skimming and searching. Below is partial result of S04E13. With some minor edits, it will be good enough to share with others.

OCR result

Things to Improve

The location of the subtitle is hard coded in the code, which means if I need to use it on another show, I will need to have some trail-and-error runs to figure out the correct parameters. The program can be smarter about that for sure.^[3]
The script is slow, and it takes several hours to finish one episode. The work could be done in a distributed way, so that more videos can be processed in a shorter time range.
The script cannot tell who is speaking the current lines. Some work need to be done here to somehow identify the current speaker and decorate the output with that.

More in Mind

It would be fun to build a website with all the transcripts of the show. However, it’s a little tricky from the copyright perspective, as although the show is free to watch and listen on the Internet, the transcripts may require extra permissions to propagate. The copyright problem of the transcripts can be a fun topic to explore though, as for songs, lyrics requires additional copyrights (I believe this is the main reason why Apple Music and Spotify do not have very good support for lyrics compared to Chinese music apps.), while companies like Musixmatch allow users to upload lyrics for songs in an UGC way.

Another way to do this would be talking to the Vistopia to understand whether or not it’s possible to release the subtitles they have embedded in the shows, then we can save the efforts above and enough accurate subtitles effortlessly.

I found the following paper during writing this blog. From the title, it seems to be a fun reading, or probably all the Chinese TV talk show related papers are fun to read as well.

Yi's Blog

Make, Observe, and Analyze

Extract Subtitles From TV Shows

Extract Subtitles From the Videos

Things to Improve

More in Mind

Footnotes