Python wordcloud词云：源码分析及简单使用_飞道的博客

Python wordcloud词云：源码分析及简单使用

2020-05-19 09:21 496人阅读评论(0)

Python版本的词云生成模块从2015年的v1.0到现在，已经更新到了v1.7。

下载请移步至：https://pypi.org/project/wordcloud/

wordcloud简单应用：


  
   
    
     
    
    
     
      import jieba
     
    
   
    
     
    
    
     
      import wordcloud
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      w = wordcloud.WordCloud(
     
    
   
    
     
    
    
     
          width=
      600,
     
    
   
    
     
    
    
     
          height=
      600,
     
    
   
    
     
    
    
     
          background_color=
      'white',
     
    
   
    
     
    
    
     
          font_path=
      'msyh.ttc'
     
    
   
    
     
    
    
     
      )
     
    
   
    
     
    
    
     
      text = 
      '看到此标题，我也是感慨万千 首先弄清楚搞IT和被IT搞，谁是搞IT的？马云就是，马化腾也是，刘强东也是，他们都是叫搞IT的， 但程序员只是被IT搞的人，可以比作盖楼砌砖的泥瓦匠，你想想，四十岁的泥瓦匠能跟二十左右岁的年轻人较劲吗？如果你是老板你会怎么做？程序员只是技术含量高的泥瓦匠，社会是现实的，社会的现实是什么？利益驱动。当你跑的速度不比以前快了时，你就会被挨鞭子赶，这种窘境如果在做程序员当初就预料到的话，你就会知道，到达一定高度时，你需要改变行程。 程序员其实真的不是什么好职业，技术每天都在更新，要不停的学，你以前学的每天都在被淘汰，加班可能是标配了吧。 热点，你知道什么是热点吗？社会上啥热就是热点，我举几个例子：在早淘宝之初，很多人都觉得做淘宝能让自己发展，当初的规则是产品按时间轮候展示，也就是你的商品上架时间一到就会被展示，不论你星级多高。这种一律平等的条件固然好，但淘宝随后调整了显示规则，对产品和店铺，销量进行了加权，一下导致小卖家被弄到了很深的胡同里，没人看到自己的产品，如何卖？做广告费用也非常高，入不敷出，想必做过淘宝的都知道，再后来淘宝弄天猫，显然，天猫是上档次的商城，不同于淘宝的摆地摊，因为摊位费涨价还闹过事，闹也白闹，你有能力就弄，没能力就淘汰掉。前几天淘宝又推出C2M,客户反向定制，客户直接挂钩大厂家，没你小卖家什么事。 后来又出现了微商，在微商出现当天我就知道这东西不行，它比淘宝假货还下三滥.我对TX一直有点偏见，因为骗子都使用QQ 我说这么多只想说一个事，世界是变化的，你只能适应变化，否则就会被淘汰。 还是回到热点这个话题，育儿嫂这个职位有很多人了解吗？前几年放开二胎后，这个职位迅速串红，我的一个亲戚初中毕业，现在已经月入一万五，职务就是照看刚出生的婴儿28天，节假日要双薪。 你说这难到让我一个男的去当育儿嫂吗？扯，我只是说热点问题。你没踩在热点上，你赚钱就会很费劲 这两年的热点是什么？短视频，你可以看到抖音的一些作品根本就不是普通人能实现的，说明专业级人才都开始努力往这上使劲了。 我只会编程，别的不会怎么办？那你就去编程。没人用了怎么办？你看看你自己能不能雇佣你自己 学会适应社会，学会改变自己去适应社会 最后说一句：科大讯飞的刘鹏说的是对的。那我为什么还做程序员？他可以完成一些原始积累，只此而已。'
     
    
   
    
     
    
    
     
      new_str = 
      ' '.join(jieba.lcut(text))
     
    
   
    
     
    
    
     
      w.generate(new_str)
     
    
   
    
     
    
    
     
      w.to_file(
      'x.png')

下面分析源码：

wordcloud源码中生成词云图的主要步骤有：

1、分割词组

2、生成词云

3、保存图片

我们从 generate(self, text)切入，发现它仅仅调用了自身对象的一个方法 self.generate_from_text(text)


  
   
    
     
    
    
         
      def generate_from_text(self, text):
     
    
   
    
     
    
    
             
      """Generate wordcloud from text.
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
     
              words = self.process_text(text) 
      # 分割词组
     
    
   
    
     
    
    
     
              self.generate_from_frequencies(words) 
      # 生成词云的主要方法（重点分析）
     
    
   
    
     
    
    
             
      return self

process_text()源码如下，处理的逻辑比较简单：分割词组、去除数字、去除's、去除数字、去除短词、去除禁用词等。


  
   
    
     
    
    
         
      def process_text(self, text):
     
    
   
    
     
    
    
             
      """Splits a long text into words, eliminates the stopwords.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       Parameters
     
    
   
    
     
    
    
     
       ----------
     
    
   
    
     
    
    
     
       text : string
     
    
   
    
     
    
    
     
       The text to be processed.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       Returns
     
    
   
    
     
    
    
     
       -------
     
    
   
    
     
    
    
     
       words : dict (string, int)
     
    
   
    
     
    
    
     
       Word tokens with associated frequency.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       ..versionchanged:: 1.2.2
     
    
   
    
     
    
    
     
       Changed return type from list of tuples to dict.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       Notes
     
    
   
    
     
    
    
     
       -----
     
    
   
    
     
    
    
     
       There are better ways to do word tokenization, but I don't want to
     
    
   
    
     
    
    
     
       include all those things.
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
              flags = (re.UNICODE 
      if sys.version < 
      '3' 
      and type(text) 
      is unicode 
      else 
      0) 
     
    
   
    
     
    
    
                     
     
    
   
    
     
    
    
     
              regexp = self.regexp 
      if self.regexp 
      is 
      not 
      None 
      else 
      r"\w[\w']+"
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 获得分词
     
    
   
    
     
    
    
     
              words = re.findall(regexp, text, flags)
     
    
   
    
     
    
    
             
      # 去除 's
     
    
   
    
     
    
    
     
              words = [word[:
      -2] 
      if word.lower().endswith(
      "'s") 
      else word 
      for word 
      in words]
     
    
   
    
     
    
    
             
      # 去除数字
     
    
   
    
     
    
    
             
      if 
      not self.include_numbers:
     
    
   
    
     
    
    
     
                  words = [word 
      for word 
      in words 
      if 
      not word.isdigit()]
     
    
   
    
     
    
    
             
      # 去除短词，长度小于指定值min_word_length的词，被视为短词，筛除
     
    
   
    
     
    
    
             
      if self.min_word_length:
     
    
   
    
     
    
    
     
                  words = [word 
      for word 
      in words 
      if len(word) >= self.min_word_length]
     
    
   
    
     
    
    
             
      # 去除禁用词
     
    
   
    
     
    
    
     
              stopwords = set([i.lower() 
      for i 
      in self.stopwords])
     
    
   
    
     
    
    
             
      if self.collocations:
     
    
   
    
     
    
    
     
                  word_counts = unigrams_and_bigrams(words, stopwords, self.normalize_plurals, self.collocation_threshold)
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
                 
      # remove stopwords
     
    
   
    
     
    
    
     
                  words = [word 
      for word 
      in words 
      if word.lower() 
      not 
      in stopwords]
     
    
   
    
     
    
    
     
                  word_counts, _ = process_tokens(words, self.normalize_plurals)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      return word_counts

重头戏来了

generate_from_frequencies(self, frequencies, max_font_size=None) 方法体内的代码比较多，总体上分为以下几步：

1、排序

2、词频归一化

3、创建绘图对象

4、确定初始字体大小（字号）

5、扩展单词集

6、确定每个单词的字体大小、位置、旋转角度、颜色等信息

源码如下（根据个人理解已添加中文注释）：


  
   
    
     
    
    
         
      def generate_from_frequencies(self, frequencies, max_font_size=None):
     
    
   
    
     
    
    
             
      """Create a word_cloud from words and frequencies.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       Parameters
     
    
   
    
     
    
    
     
       ----------
     
    
   
    
     
    
    
     
       frequencies : dict from string to float
     
    
   
    
     
    
    
     
       A contains words and associated frequency.
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       max_font_size : int
     
    
   
    
     
    
    
     
       Use this font-size instead of self.max_font_size
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       Returns
     
    
   
    
     
    
    
     
       -------
     
    
   
    
     
    
    
     
       self
     
    
   
    
     
    
    
     
      
     
    
   
    
     
    
    
     
       """
     
    
   
    
     
    
    
             
      # make sure frequencies are sorted and normalized
     
    
   
    
     
    
    
             
      # 1、排序
     
    
   
    
     
    
    
             
      # 对“单词-频率”列表按频率降序排序
     
    
   
    
     
    
    
     
              frequencies = sorted(frequencies.items(), key=itemgetter(
      1), reverse=
      True)
     
    
   
    
     
    
    
             
      if len(frequencies) <= 
      0:
     
    
   
    
     
    
    
                 
      raise ValueError(
      "We need at least 1 word to plot a word cloud, "
     
    
   
    
     
    
    
                                  
      "got %d." % len(frequencies))
     
    
   
    
     
    
    
             
      # 确保单词数在设置的最大范围内，超出的部分被舍弃掉
     
    
   
    
     
    
    
     
              frequencies = frequencies[:self.max_words]
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # largest entry will be 1
     
    
   
    
     
    
    
             
      # 取第一个单词的频率作为最大词频
     
    
   
    
     
    
    
     
              max_frequency = float(frequencies[
      0][
      1])
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 2、词频归一化
     
    
   
    
     
    
    
             
      # 把所有单词的词频归一化，由于单词已经排序，所以归一化后应该是这样的：[('xxx', 1),('xxx', 0.96),('xxx', 0.87),...]
     
    
   
    
     
    
    
     
              frequencies = [(word, freq / max_frequency)
     
    
   
    
     
    
    
                            
      for word, freq 
      in frequencies]
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 随机对象，用于产生一个随机数，来确定是否旋转90度
     
    
   
    
     
    
    
             
      if self.random_state 
      is 
      not 
      None:
     
    
   
    
     
    
    
     
                  random_state = self.random_state
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
     
                  random_state = Random()
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      if self.mask 
      is 
      not 
      None:
     
    
   
    
     
    
    
     
                  boolean_mask = self._get_bolean_mask(self.mask)
     
    
   
    
     
    
    
     
                  width = self.mask.shape[
      1]
     
    
   
    
     
    
    
     
                  height = self.mask.shape[
      0]
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
     
                  boolean_mask = 
      None
     
    
   
    
     
    
    
     
                  height, width = self.height, self.width
     
    
   
    
     
    
    
             
      # 用于查找单词可能放置的位置，例如图片有效范围内的空白处（非文字区域）
     
    
   
    
     
    
    
     
              occupancy = IntegralOccupancyMap(height, width, boolean_mask)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 3、创建绘图对象
     
    
   
    
     
    
    
             
      # create image
     
    
   
    
     
    
    
     
              img_grey = Image.new(
      "L", (width, height))
     
    
   
    
     
    
    
     
              draw = ImageDraw.Draw(img_grey)
     
    
   
    
     
    
    
     
              img_array = np.asarray(img_grey)
     
    
   
    
     
    
    
     
              font_sizes, positions, orientations, colors = [], [], [], []
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
              last_freq = 
      1.
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 4、确定初始字号
     
    
   
    
     
    
    
             
      # 确定最大字号
     
    
   
    
     
    
    
             
      if max_font_size 
      is 
      None:
     
    
   
    
     
    
    
                 
      # if not provided use default font_size
     
    
   
    
     
    
    
     
                  max_font_size = self.max_font_size
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 如果最大字号是空的，就需要确定一个最大字号作为初始字号
     
    
   
    
     
    
    
             
      if max_font_size 
      is 
      None:
     
    
   
    
     
    
    
                 
      # figure out a good font size by trying to draw with
     
    
   
    
     
    
    
                 
      # just the first two words
     
    
   
    
     
    
    
                 
      if len(frequencies) == 
      1:
     
    
   
    
     
    
    
                     
      # we only have one word. We make it big!
     
    
   
    
     
    
    
     
                      font_size = self.height
     
    
   
    
     
    
    
                 
      else:
     
    
   
    
     
    
    
                     
      # 递归进入当前函数，以获得一个self.layout_，其中只有前两个单词的词频信息
     
    
   
    
     
    
    
                     
      # 使用这两个词频计算出一个初始字号
     
    
   
    
     
    
    
     
                      self.generate_from_frequencies(dict(frequencies[:
      2]),
     
    
   
    
     
    
    
     
                                                     max_font_size=self.height)
     
    
   
    
     
    
    
                     
      # find font sizes
     
    
   
    
     
    
    
     
                      sizes = [x[
      1] 
      for x 
      in self.layout_]
     
    
   
    
     
    
    
                     
      try:
     
    
   
    
     
    
    
     
                          font_size = int(
      2 * sizes[
      0] * sizes[
      1]
     
    
   
    
     
    
    
     
                                          / (sizes[
      0] + sizes[
      1]))
     
    
   
    
     
    
    
                     
      # quick fix for if self.layout_ contains less than 2 values
     
    
   
    
     
    
    
                     
      # on very small images it can be empty
     
    
   
    
     
    
    
                     
      except IndexError:
     
    
   
    
     
    
    
                         
      try:
     
    
   
    
     
    
    
     
                              font_size = sizes[
      0]
     
    
   
    
     
    
    
                         
      except IndexError:
     
    
   
    
     
    
    
                             
      raise ValueError(
     
    
   
    
     
    
    
                                 
      "Couldn't find space to draw. Either the Canvas size"
     
    
   
    
     
    
    
                                 
      " is too small or too much of the image is masked "
     
    
   
    
     
    
    
                                 
      "out.")
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
     
                  font_size = max_font_size
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # we set self.words_ here because we called generate_from_frequencies
     
    
   
    
     
    
    
             
      # above... hurray for good design?
     
    
   
    
     
    
    
     
              self.words_ = dict(frequencies)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 5、扩展单词集
     
    
   
    
     
    
    
             
      # 如果单词数不足最大值，则扩展单词集以达到最大值
     
    
   
    
     
    
    
             
      if self.repeat 
      and len(frequencies) < self.max_words:
     
    
   
    
     
    
    
                 
      # pad frequencies with repeating words.
     
    
   
    
     
    
    
     
                  times_extend = int(np.ceil(self.max_words / len(frequencies))) - 
      1
     
    
   
    
     
    
    
                 
      # get smallest frequency
     
    
   
    
     
    
    
     
                  frequencies_org = list(frequencies)
     
    
   
    
     
    
    
     
                  downweight = frequencies[
      -1][
      1]
     
    
   
    
     
    
    
                 
      # 扩展单词数，词频会保持原有词频的递减规则。
     
    
   
    
     
    
    
                 
      for i 
      in range(times_extend):
     
    
   
    
     
    
    
     
                      frequencies.extend([(word, freq * downweight ** (i + 
      1))
     
    
   
    
     
    
    
                                         
      for word, freq 
      in frequencies_org])
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # 6、确定每一个单词的字体大小、位置、旋转角度、颜色等信息
     
    
   
    
     
    
    
             
      # start drawing grey image
     
    
   
    
     
    
    
             
      for word, freq 
      in frequencies:
     
    
   
    
     
    
    
                 
      if freq == 
      0:
     
    
   
    
     
    
    
                     
      continue
     
    
   
    
     
    
    
                 
      # select the font size
     
    
   
    
     
    
    
     
                  rs = self.relative_scaling
     
    
   
    
     
    
    
                 
      if rs != 
      0:
     
    
   
    
     
    
    
     
                      font_size = int(round((rs * (freq / float(last_freq))
     
    
   
    
     
    
    
     
                                             + (
      1 - rs)) * font_size))
     
    
   
    
     
    
    
                 
      if random_state.random() < self.prefer_horizontal:
     
    
   
    
     
    
    
     
                      orientation = 
      None
     
    
   
    
     
    
    
                 
      else:
     
    
   
    
     
    
    
     
                      orientation = Image.ROTATE_90
     
    
   
    
     
    
    
     
                  tried_other_orientation = 
      False
     
    
   
    
     
    
    
                 
      # 寻找可能放置的位置，如果寻找一次，没有找到，则尝试改变文字方向或缩小字体大小，继续寻找。
     
    
   
    
     
    
    
                 
      # 直到找到放置位置或者字体大小超出字号下限
     
    
   
    
     
    
    
                 
      while 
      True:
     
    
   
    
     
    
    
                     
      # try to find a position
     
    
   
    
     
    
    
     
                      font = ImageFont.truetype(self.font_path, font_size)
     
    
   
    
     
    
    
                     
      # transpose font optionally
     
    
   
    
     
    
    
     
                      transposed_font = ImageFont.TransposedFont(
     
    
   
    
     
    
    
     
                          font, orientation=orientation)
     
    
   
    
     
    
    
                     
      # get size of resulting text
     
    
   
    
     
    
    
     
                      box_size = draw.textsize(word, font=transposed_font)
     
    
   
    
     
    
    
                     
      # find possible places using integral image:
     
    
   
    
     
    
    
     
                      result = occupancy.sample_position(box_size[
      1] + self.margin,
     
    
   
    
     
    
    
     
                                                         box_size[
      0] + self.margin,
     
    
   
    
     
    
    
     
                                                         random_state)
     
    
   
    
     
    
    
                     
      if result 
      is 
      not 
      None 
      or font_size < self.min_font_size:
     
    
   
    
     
    
    
                         
      # either we found a place or font-size went too small
     
    
   
    
     
    
    
                         
      break
     
    
   
    
     
    
    
                     
      # if we didn't find a place, make font smaller
     
    
   
    
     
    
    
                     
      # but first try to rotate!
     
    
   
    
     
    
    
                     
      if 
      not tried_other_orientation 
      and self.prefer_horizontal < 
      1:
     
    
   
    
     
    
    
     
                          orientation = (Image.ROTATE_90 
      if orientation 
      is 
      None 
      else
     
    
   
    
     
    
    
     
                                         Image.ROTATE_90)
     
    
   
    
     
    
    
     
                          tried_other_orientation = 
      True
     
    
   
    
     
    
    
                     
      else:
     
    
   
    
     
    
    
     
                          font_size -= self.font_step
     
    
   
    
     
    
    
     
                          orientation = 
      None
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
                 
      if font_size < self.min_font_size:
     
    
   
    
     
    
    
                     
      # we were unable to draw any more
     
    
   
    
     
    
    
                     
      break
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
                 
      # 收集该词的信息：字体大小、位置、旋转角度、颜色
     
    
   
    
     
    
    
     
                  x, y = np.array(result) + self.margin // 
      2
     
    
   
    
     
    
    
                 
      # actually draw the text
     
    
   
    
     
    
    
                 
      # 此处绘制图像仅仅用于寻找放置单词的位置，而不是最终的词云图片。词云图片是在另一个函数中生成：to_image
     
    
   
    
     
    
    
     
                  draw.text((y, x), word, fill=
      "white", font=transposed_font)
     
    
   
    
     
    
    
     
                  positions.append((x, y))
     
    
   
    
     
    
    
     
                  orientations.append(orientation)
     
    
   
    
     
    
    
     
                  font_sizes.append(font_size)
     
    
   
    
     
    
    
     
                  colors.append(self.color_func(word, font_size=font_size,
     
    
   
    
     
    
    
     
                                                position=(x, y),
     
    
   
    
     
    
    
     
                                                orientation=orientation,
     
    
   
    
     
    
    
     
                                                random_state=random_state,
     
    
   
    
     
    
    
     
                                                font_path=self.font_path))
     
    
   
    
     
    
    
                 
      # recompute integral image
     
    
   
    
     
    
    
                 
      if self.mask 
      is 
      None:
     
    
   
    
     
    
    
     
                      img_array = np.asarray(img_grey)
     
    
   
    
     
    
    
                 
      else:
     
    
   
    
     
    
    
     
                      img_array = np.asarray(img_grey) + boolean_mask
     
    
   
    
     
    
    
                 
      # recompute bottom right
     
    
   
    
     
    
    
                 
      # the order of the cumsum's is important for speed ?!
     
    
   
    
     
    
    
     
                  occupancy.update(img_array, x, y)
     
    
   
    
     
    
    
     
                  last_freq = freq
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      # layout_是单词信息列表，表中每项信息：单词、频率、字体大小、位置、旋转角度、颜色等信息。为后续步骤的绘图工作做好准备。
     
    
   
    
     
    
    
     
              self.layout_ = list(zip(frequencies, font_sizes, positions,
     
    
   
    
     
    
    
     
                                      orientations, colors))
     
    
   
    
     
    
    
             
      return self

注意

在第6步确定位置时，程序使用循环和随机数来查找合适的放置位置，源码如下。


  
   
    
     
    
    
                 
      # 寻找可能放置的位置，如果寻找一次，没有找到，则尝试改变文字方向或缩小字体大小，继续寻找。
     
    
   
    
     
    
    
                 
      # 直到找到放置位置或者字体大小超出字号下限
     
    
   
    
     
    
    
                 
      while 
      True:
     
    
   
    
     
    
    
                     
      # try to find a position
     
    
   
    
     
    
    
     
                      font = ImageFont.truetype(self.font_path, font_size)
     
    
   
    
     
    
    
                     
      # transpose font optionally
     
    
   
    
     
    
    
     
                      transposed_font = ImageFont.TransposedFont(
     
    
   
    
     
    
    
     
                          font, orientation=orientation)
     
    
   
    
     
    
    
                     
      # get size of resulting text
     
    
   
    
     
    
    
     
                      box_size = draw.textsize(word, font=transposed_font)
     
    
   
    
     
    
    
                     
      # find possible places using integral image:
     
    
   
    
     
    
    
     
                      result = occupancy.sample_position(box_size[
      1] + self.margin,
     
    
   
    
     
    
    
     
                                                         box_size[
      0] + self.margin,
     
    
   
    
     
    
    
     
                                                         random_state)
     
    
   
    
     
    
    
                     
      if result 
      is 
      not 
      None 
      or font_size < self.min_font_size:
     
    
   
    
     
    
    
                         
      # either we found a place or font-size went too small
     
    
   
    
     
    
    
                         
      break
     
    
   
    
     
    
    
                     
      # if we didn't find a place, make font smaller
     
    
   
    
     
    
    
                     
      # but first try to rotate!
     
    
   
    
     
    
    
                     
      if 
      not tried_other_orientation 
      and self.prefer_horizontal < 
      1:
     
    
   
    
     
    
    
     
                          orientation = (Image.ROTATE_90 
      if orientation 
      is 
      None 
      else
     
    
   
    
     
    
    
     
                                         Image.ROTATE_90)
     
    
   
    
     
    
    
     
                          tried_other_orientation = 
      True
     
    
   
    
     
    
    
                     
      else:
     
    
   
    
     
    
    
     
                          font_size -= self.font_step
     
    
   
    
     
    
    
     
                          orientation = 
      None

其中 occupancy.sample_position() 是具体寻找合适位置的方法。当你试图进一步了解其中的奥秘时，却发现你的【Ctrl＋左键】已经无法跳转到深层代码了，悲哀的事情还是发生了......o(╥﹏╥)o

在wordcloud.py文件的顶部有这么一行： from .query_integral_image import query_integral_image 而query_integral_image 是一个pyd文件，该文件无法直接查看。有关pyd格式的更多资料，请自行查阅。

再回到 generate_from_frequencies 上来，方法的最后把数据整理到了 self.layout_ 变量里，这里面就是所有词组绘制时所需要的信息了。然后就可以调用to_file()方法，保存图片了。


  
   
    
     
    
    
         
      def to_file(self, filename):
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
              img = self.to_image()
     
    
   
    
     
    
    
     
              img.save(filename, optimize=
      True)
     
    
   
    
     
    
    
             
      return self

核心方法 to_image() 就会把self.layout_里的信息依次取出，绘制每一个词组。


  
   
    
     
    
    
         
      def to_image(self):
     
    
   
    
     
    
    
     
              self._check_generated()
     
    
   
    
     
    
    
             
      if self.mask 
      is 
      not 
      None:
     
    
   
    
     
    
    
     
                  width = self.mask.shape[
      1]
     
    
   
    
     
    
    
     
                  height = self.mask.shape[
      0]
     
    
   
    
     
    
    
             
      else:
     
    
   
    
     
    
    
     
                  height, width = self.height, self.width
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
              img = Image.new(self.mode, (int(width * self.scale),
     
    
   
    
     
    
    
     
                                          int(height * self.scale)),
     
    
   
    
     
    
    
     
                              self.background_color)
     
    
   
    
     
    
    
     
              draw = ImageDraw.Draw(img)
     
    
   
    
     
    
    
             
      for (word, count), font_size, position, orientation, color 
      in self.layout_:
     
    
   
    
     
    
    
     
                  font = ImageFont.truetype(self.font_path,
     
    
   
    
     
    
    
     
                                            int(font_size * self.scale))
     
    
   
    
     
    
    
     
                  transposed_font = ImageFont.TransposedFont(
     
    
   
    
     
    
    
     
                      font, orientation=orientation)
     
    
   
    
     
    
    
     
                  pos = (int(position[
      1] * self.scale),
     
    
   
    
     
    
    
     
                         int(position[
      0] * self.scale))
     
    
   
    
     
    
    
     
                  draw.text(pos, word, fill=color, font=transposed_font)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
             
      return self._draw_contour(img=img)

引申思考：

查找文字合适的放置该怎样实现呢？（注意：文字笔画的空隙里也是可以放置更小一字号的文字）

~ End ~

转载：https://blog.csdn.net/bailichun19901111/article/details/106118092

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章