In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a reconfigurable array structure composed of 30 processing elements (PEs). The coordinate rotational digital computer (CORDIC) algorithm is implemented on this structure. Different functions, such as sine, cosine, inverse tangent, logarithmic, etc., can be calculated based on the structure by reconfiguring the functions of PEs. The functional simulation and field programmable gate array (FPGA) verification show that the proposed method obtains great flexibility with acceptable performance.
The new encoding tools of high efficiency video coding(HEVC) make the interpolation operation more complex in motion compensation(MC) for better video compression, but impose higher requirements on the computational efficiency and control logic of the hardware architecture. The reconfigurable array processor can take into consideration both the computational efficiency and flexible switching of algorithms very well. Through mining the data dependency and parallelism among interpolation operation, this paper presents a parallelization method based on the dynamic reconfigurable array processor proposed by the project team. The number of pixels loaded from the external memory is reduced significantly, by multiplexing the common data in the previous reference block and the current reference block. Flexible switching of variable block operation is realized by using dynamic reconfiguration mechanism. A 16×16 processor element(PE)’s array is used to dynamically process a 4×4-64×64 block size. The experimental results show that, the reference block update speed is increased by 39.9%. In the case of an array size of 16 PEs, the number of pixels processed in parallel reaches 16.
针对HEVC帧内预测Planar和DC模式算法的特点,提出实现这两种模式的并行化方法。该方法是通过分析推导Planar和DC模式算法之间的可并行性,以西安邮电大学自主设计的一款面向图形、图像应用的阵列处理器PAAG(Polymorphic Array Architecture for Graphics and Image Processing)平台为基础,采用最优的数据分配方式,合理地设计了多处理单元并行工作的算法程序。实验结果表明Planar预测模式和DC预测模式在多处理单元上的并行实现,相比于单核的串行运算速度分别提高了84%和81%,串/并行加速比分别达到6.34和5.44。该并行化算法减少了视频的编解码时间,其数据分配方案对于帧内预测算法在多核结构上的并行化研究也有一定的参考价值。