TensorFlow + KerasでのRoIプーリングの実装

こんにちは、ハブル!Jaime Sevillaによる記事「TensorFlow + KerasでのRoIプーリングの実装」の翻訳に注目します



現在、機械学習コースを受講しています。トレーニングブロック「コンピュータービジョン」では、レイヤーのRoIプーリングを研究する必要がありました。以下の記事は私には興味深く思えたので、翻訳をコミュニティと共有することにしました。



この投稿では、RoIプーリング(関心領域の基本概念と一般的な使用方法を説明し、TensorFlow Kerasレイヤーを使用した実装を提供します。



この記事の対象読者は、(畳み込み)ニューラルネットワーク(CNNs)の基本的な理論に精通しており、使用して単純なモデル構築して実行することができます人々であるKerasを



あなたがコードだけをここにいるなら、ここをチェックし、記事を好きで共有することを忘れないでください!



RoIプーリングについて



RoIプーリングは、Ross Girshikがオブジェクト認識パイプラインの一部としてFast R-CNNの記事で提案しました



RoI Poolingの一般的な使用例では、画像のようなオブジェクトと、境界ボックスで指定されたいくつかの関心領域(RoI)があります。各RoIから埋め込み(埋め込み-任意のエンティティ(画像の一部)を特定のベクトルにマッピングする)を作成します。



たとえば、R-CNNセットアップでは、画像と、画像の興味深い部分の境界ボックスを作成する候補領域強調表示エンジンがあります。ここで、画像の提案された部分ごとに埋め込みを作成します。



画像内の候補領域を強調表示する



結果の埋め込みを互いに重ね合わせたいので、提案された各領域を単にトリミングするだけでは機能せず、提案された領域は必ずしも同じ形ではありません!



, . ?



- (pooling).



max pooling, ( ) , , .



maxpool操作は、各領域を同じサイズのプールに分割します



maxpool

, – : .



. RoI ?



ROIプーリング操作は、画像のプーリング部分を同じサイズのグリッドで分割します。



ROI Pooling , pooling.

, ROI Pooling.



RoI Pooling.



RoI Pooling — . , RoI, . , .

-, ( RoI Pooling ), ( ), (end-to-end) (single-pass) .



Ross GirshikによるRoIプーリングを示す高速R-CNN



Fast R-CNN, RoI Pooling,

, R-CNN , (RoI). RoI Pooling CNN . .

-, -, RoI Pooling (visual attention).



原らによるROIプーリングを示す、視覚オブジェクト検出のための注意ネットワーク。



Attentional Network , ROI Pooling, Hara ..

Attentional Network , Hara attention, ROI ROI Pooling. (t = 1), ROI Pooling (Fully Connected). Glimpse () (t = 2) , ROI Pooling. .



.



, , ROI.



:



  • (batch) . , . (batch_size, img_width, img_height, n_channels), batch_size- , img_width — , img_height — , n_channels — .
  • (batch) ROI. , - . 4 , (batch_size, n_rois, 4), batch_size — ROI, n_rois — ROI.


:



  • , ROI. (batch_size, n_rois, pooled_width, pooled_height, n_channels). batch_size- , n_rois — ROI, pooled_width — , pooled_height— , n_channels — .


Keras



Keras Layer.



tf.keras init, build call . , build , , . compute_output_shape.



, .



def __init__(self, pooled_height, pooled_width, **kwargs):
    self.pooled_height = pooled_height
    self.pooled_width = pooled_width
    super(ROIPoolingLayer, self).__init__(**kwargs)


. , . .



def compute_output_shape(self, input_shape):
    """ Returns the shape of the ROI Layer output
    """
    feature_map_shape, rois_shape = input_shape
    assert feature_map_shape[0] == rois_shape[0]
    batch_size = feature_map_shape[0]
    n_rois = rois_shape[1]
    n_channels = feature_map_shape[3]
    return (batch_size, n_rois, self.pooled_height, 
            self.pooled_width, n_channels)


compute_output_shape — , , .



(call). — , . , ROI Pooling, .



, , ROI .



.



@staticmethod
def _pool_roi(feature_map, roi, pooled_height, pooled_width):
  """ Applies ROI Pooling to a single image and a single ROI
  """# Compute the region of interest        
  feature_map_height = int(feature_map.shape[0])
  feature_map_width  = int(feature_map.shape[1])

  h_start = tf.cast(feature_map_height * roi[0], 'int32')
  w_start = tf.cast(feature_map_width  * roi[1], 'int32')
  h_end   = tf.cast(feature_map_height * roi[2], 'int32')
  w_end   = tf.cast(feature_map_width  * roi[3], 'int32')

  region = feature_map[h_start:h_end, w_start:w_end, :]
...


, .



, ROI , 0 1. , ROI 4- , (x_min, y_min, x_max, y_max ).

ROI , , , : , , ROI Pooling, , , ROI.



, TensorFlow.



...
# Divide the region into non overlapping areas
region_height = h_end - h_start
region_width  = w_end - w_start
h_step = tf.cast(region_height / pooled_height, 'int32')
w_step = tf.cast(region_width  / pooled_width , 'int32')

areas = [[(
           i*h_step, 
           j*w_step, 
           (i+1)*h_step if i+1 < pooled_height else region_height, 
           (j+1)*w_step if j+1 < pooled_width else region_width
          ) 
          for j in range(pooled_width)] 
         for i in range(pooled_height)]
...


ROI, .



2D , , , .



, , , , ROI (region_height // pooled_height, region_width // pooled_width), ROI, .



2D , .



...
# Take the maximum of each area and stack the result
def pool_area(x): 
  return tf.math.reduce_max(region[x[0]:x[2],x[1]:x[3],:], axis=[0,1])

pooled_features = tf.stack([[pool_area(x) for x in row] for row in areas])
return pooled_features


. pool_area, , , , , .

pool_area , , list comprehension .



(pooled_height, pooled_width, n_channels), RoI .



— RoI . tf.map_fn (n_rois, pooled_height, pooled_width, n_channels).



@staticmethod
def _pool_rois(feature_map, rois, pooled_height, pooled_width):
  """ Applies ROI pooling for a single image and varios ROIs
  """
  def curried_pool_roi(roi): 
    return ROIPoolingLayer._pool_roi(feature_map, roi, 
                                     pooled_height, pooled_width)

  pooled_areas = tf.map_fn(curried_pool_roi, rois, dtype=tf.float32)
  return pooled_areas


, . tf.map_fn (, x), , .



def call(self, x):
  """ Maps the input tensor of the ROI layer to its output
  """
  def curried_pool_rois(x): 
    return ROIPoolingLayer._pool_rois(x[0], x[1], 
                                      self.pooled_height, 
                                      self.pooled_width)

  pooled_areas = tf.map_fn(curried_pool_rois, x, dtype=tf.float32)
  return pooled_areas


, dtype tf.map_fn , . , , , Tensorflow.



:



import tensorflow as tf
from tensorflow.keras.layers import Layer

class ROIPoolingLayer(Layer):
    """ Implements Region Of Interest Max Pooling 
        for channel-first images and relative bounding box coordinates

        # Constructor parameters
            pooled_height, pooled_width (int) -- 
              specify height and width of layer outputs

        Shape of inputs
            [(batch_size, pooled_height, pooled_width, n_channels),
             (batch_size, num_rois, 4)]

        Shape of output
            (batch_size, num_rois, pooled_height, pooled_width, n_channels)

    """
    def __init__(self, pooled_height, pooled_width, **kwargs):
        self.pooled_height = pooled_height
        self.pooled_width = pooled_width

        super(ROIPoolingLayer, self).__init__(**kwargs)

    def compute_output_shape(self, input_shape):
        """ Returns the shape of the ROI Layer output
        """
        feature_map_shape, rois_shape = input_shape
        assert feature_map_shape[0] == rois_shape[0]
        batch_size = feature_map_shape[0]
        n_rois = rois_shape[1]
        n_channels = feature_map_shape[3]
        return (batch_size, n_rois, self.pooled_height, 
                self.pooled_width, n_channels)

    def call(self, x):
        """ Maps the input tensor of the ROI layer to its output

            # Parameters
                x[0] -- Convolutional feature map tensor,
                        shape (batch_size, pooled_height, pooled_width, n_channels)
                x[1] -- Tensor of region of interests from candidate bounding boxes,
                        shape (batch_size, num_rois, 4)
                        Each region of interest is defined by four relative 
                        coordinates (x_min, y_min, x_max, y_max) between 0 and 1
            # Output
                pooled_areas -- Tensor with the pooled region of interest, shape
                    (batch_size, num_rois, pooled_height, pooled_width, n_channels)
        """
        def curried_pool_rois(x): 
          return ROIPoolingLayer._pool_rois(x[0], x[1], 
                                            self.pooled_height, 
                                            self.pooled_width)

        pooled_areas = tf.map_fn(curried_pool_rois, x, dtype=tf.float32)

        return pooled_areas

    @staticmethod
    def _pool_rois(feature_map, rois, pooled_height, pooled_width):
        """ Applies ROI pooling for a single image and varios ROIs
        """
        def curried_pool_roi(roi): 
          return ROIPoolingLayer._pool_roi(feature_map, roi, 
                                           pooled_height, pooled_width)

        pooled_areas = tf.map_fn(curried_pool_roi, rois, dtype=tf.float32)
        return pooled_areas

    @staticmethod
    def _pool_roi(feature_map, roi, pooled_height, pooled_width):
        """ Applies ROI pooling to a single image and a single region of interest
        """

        # Compute the region of interest        
        feature_map_height = int(feature_map.shape[0])
        feature_map_width  = int(feature_map.shape[1])

        h_start = tf.cast(feature_map_height * roi[0], 'int32')
        w_start = tf.cast(feature_map_width  * roi[1], 'int32')
        h_end   = tf.cast(feature_map_height * roi[2], 'int32')
        w_end   = tf.cast(feature_map_width  * roi[3], 'int32')

        region = feature_map[h_start:h_end, w_start:w_end, :]

        # Divide the region into non overlapping areas
        region_height = h_end - h_start
        region_width  = w_end - w_start
        h_step = tf.cast( region_height / pooled_height, 'int32')
        w_step = tf.cast( region_width  / pooled_width , 'int32')

        areas = [[(
                    i*h_step, 
                    j*w_step, 
                    (i+1)*h_step if i+1 < pooled_height else region_height, 
                    (j+1)*w_step if j+1 < pooled_width else region_width
                   ) 
                   for j in range(pooled_width)] 
                  for i in range(pooled_height)]

        # take the maximum of each area and stack the result
        def pool_area(x): 
          return tf.math.reduce_max(region[x[0]:x[2], x[1]:x[3], :], axis=[0,1])

        pooled_features = tf.stack([[pool_area(x) for x in row] for row in areas])
        return pooled_features


! , 1- 100x200, 2 RoI, 7x3. , 4 . — 1, 50 (-1, -3).



import numpy as np# Define parameters
batch_size = 1
img_height = 200
img_width = 100
n_channels = 1
n_rois = 2
pooled_height = 3
pooled_width = 7# Create feature map input
feature_maps_shape = (batch_size, img_height, img_width, n_channels)
feature_maps_tf = tf.placeholder(tf.float32, shape=feature_maps_shape)
feature_maps_np = np.ones(feature_maps_tf.shape, dtype='float32')
feature_maps_np[0, img_height-1, img_width-3, 0] = 50
print(f"feature_maps_np.shape = {feature_maps_np.shape}")# Create batch size
roiss_tf = tf.placeholder(tf.float32, shape=(batch_size, n_rois, 4))
roiss_np = np.asarray([[[0.5,0.2,0.7,0.4], [0.0,0.0,1.0,1.0]]], dtype='float32')
print(f"roiss_np.shape = {roiss_np.shape}")# Create layer
roi_layer = ROIPoolingLayer(pooled_height, pooled_width)
pooled_features = roi_layer([feature_maps_tf, roiss_tf])
print(f"output shape of layer call = {pooled_features.shape}")# Run tensorflow session
with tf.Session() as session:
    result = session.run(pooled_features, 
                         feed_dict={feature_maps_tf:feature_maps_np,  
                                    roiss_tf:roiss_np})

print(f"result.shape = {result.shape}")
print(f"first  roi embedding=\n{result[0,0,:,:,0]}")
print(f"second roi embedding=\n{result[0,1,:,:,0]}")


, TensorFlow, .



:



feature_maps_np.shape = (1, 200, 100, 1)
roiss_np.shape = (1, 2, 4)
output shape of layer call = (1, 2, 3, 7, 1)
result.shape = (1, 2, 3, 7, 1)
first  roi embedding=
[[1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]]
second roi embedding=
[[ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1. 50.]]


, . — 1, , 50.



, !





, !



, ROI Pooling (attention). , , Keras , ROI Pooling .



, , , !



Ari Brill, Tjark Miener Bryan Kim .








All Articles