## DEPARTAMENTO DE INGENIERÍA ELECTRÓNICA

# ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN



### TESIS DOCTORAL

# COMBINED WORD-LENGTH ALLOCATION AND HIGH-LEVEL SYNTHESIS OF DIGITAL SIGNAL PROCESSING CIRCUITS

#### Autor:

Gabriel Caffarena Fernández Ingeniero de Telecomunicación

#### Directores:

Octavio Nieto-Taladriz García Doctor Ingeniero de Telecomunicación

Carlos Carreras Vaquer Doctor Ingeniero de Telecomunicación

2008



| Tribunal nombrado por el Magfco. Y Excmo. Sr. Rector de la Univ<br>Politécnica de Madrid, el día 28 de letatra Julio de 2008                                                                                                                       | ersidad |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| Presidente: Carlos A. López Barrio  Secretario: Mª Luis à López Barrio  Vocal: Romain Hermida Correa  Vocal: Markus Rupp  Vocal: Olivier Sentiers (No pudo anistir).  Suplente: Parix Hormigo Aguiller  Suplente: Redro Pallo Pedro Sanduet Espero |         |
|                                                                                                                                                                                                                                                    | de 2008 |
| EL PRESIDENTE EL SECRET.                                                                                                                                                                                                                           | ARIO    |

LOS VOCALES

M Rys

# CONTENTS

| bstract                         | V   |  |  |  |  |  |  |  |  |
|---------------------------------|-----|--|--|--|--|--|--|--|--|
| Acknowledgments                 |     |  |  |  |  |  |  |  |  |
| List of Figures                 |     |  |  |  |  |  |  |  |  |
| ist of Tables                   | xi  |  |  |  |  |  |  |  |  |
| List of Acronyms                |     |  |  |  |  |  |  |  |  |
| ist of Notation                 | xv  |  |  |  |  |  |  |  |  |
| Resumen                         | xix |  |  |  |  |  |  |  |  |
| Introduction                    | 1   |  |  |  |  |  |  |  |  |
| 1.1 Motivation                  | 2   |  |  |  |  |  |  |  |  |
| 1.2 Objectives                  | 5   |  |  |  |  |  |  |  |  |
| 1.3 Book organization           | 6   |  |  |  |  |  |  |  |  |
| 1.4 Contributions of the thesis | Ć   |  |  |  |  |  |  |  |  |

|   | 1.5              | Publications                                                  | 8          |  |
|---|------------------|---------------------------------------------------------------|------------|--|
| 2 | State of the art |                                                               |            |  |
|   | 2.1              | Word-length allocation                                        | 16         |  |
|   | 2.2              | High-Level synthesis                                          | 23         |  |
|   | 2.3              | Combined word-length allocation and high-level synthesis      | 27         |  |
|   | 2.4              | Field-programmable gate arrays                                | 30         |  |
|   | 2.5              | Conclusions                                                   | 32         |  |
| 3 | Higl             | n-level synthesis of DSP algorithms using FPGAs               | 35         |  |
|   | 3.1              | High-level synthesis                                          | 36         |  |
|   | 3.2              | Cost estimation                                               | 41         |  |
|   |                  | 3.2.1 Resource usage metric                                   | 41         |  |
|   |                  | 3.2.2 Resource modeling                                       | 46         |  |
|   | 3.3              | Optimization procedure                                        | 53         |  |
|   | 3.4              | Results                                                       | 59         |  |
|   | •                | 3.4.1 Uniform word-length vs. multiple word-length synthesis: |            |  |
|   |                  | Homogeneous architectures                                     | 62         |  |
|   |                  | 3.4.2 Uniform word-length vs. multiple word-length synthesis: |            |  |
|   |                  | Heterogeneous architectures                                   | 65         |  |
|   |                  | 3.4.3 MWL synthesis: Heterogeneous vs. Homogeneous            | 69         |  |
|   |                  | 3.4.4 Effect of registers and multiplexers                    | 72         |  |
|   | 3.5              | Conclusions                                                   | <b>7</b> 4 |  |
| 4 | Wol              | rd-length allocation                                          | 77         |  |
|   | 4.1              | Word-length Allocation                                        | 78         |  |
|   | 4.2              | Scaling and error estimation                                  | 86         |  |
|   |                  | 421 Affine with meeting                                       | 86         |  |

|   |        | 4.2.2                  | Scaling                                     | 87  |  |  |  |  |
|---|--------|------------------------|---------------------------------------------|-----|--|--|--|--|
|   |        | 4.2.3                  | Error estimation                            | 88  |  |  |  |  |
|   | 4.3    | Word-I                 | Length selection                            | 96  |  |  |  |  |
|   | 4.4    | Results                | 8                                           | 100 |  |  |  |  |
|   |        | 4.4.1                  | Error estimation results                    | 101 |  |  |  |  |
|   |        | 4.4.2                  | Word-length selection techniques comparison | 105 |  |  |  |  |
|   | 4.5    | Conclu                 | isions                                      | 114 |  |  |  |  |
| 5 | Con    | nbined a               | approach                                    | 117 |  |  |  |  |
|   | 5.1    | Optim                  | al approach                                 | 118 |  |  |  |  |
|   |        | 5.1.1                  | Problem definition                          | 119 |  |  |  |  |
|   |        | 5.1.2                  | MILP formulation                            | 122 |  |  |  |  |
|   |        | 5.1.3                  | Results and conclusions                     | 127 |  |  |  |  |
|   | 5.2    | 5.2 Heuristic approach |                                             |     |  |  |  |  |
|   |        | 5.2.1                  | Optimization procedure                      | 131 |  |  |  |  |
|   |        | 5.2.2                  | Results and conclusions                     | 135 |  |  |  |  |
|   | 5.3    | Concl                  | usions                                      | 147 |  |  |  |  |
| 6 | Con    | clusion                | s                                           | 151 |  |  |  |  |
|   | 6.1    | Concl                  | usions                                      | 151 |  |  |  |  |
|   | 6.2    | Future                 | e research lines                            | 158 |  |  |  |  |
| R | ihliog | ronhy                  |                                             | 161 |  |  |  |  |

## **ABSTRACT**

This work is focused on the synthesis of Digital Signal Processing (DSP) circuits using specific hardware architectures. Due to its complexity, the design process has been subdivided into separate tasks, thus hindering the global optimization of the resulting systems. The author proposes the study of the combination of two major design tasks, Word-Length Allocation (WLA) and High-Level Synthesis (HLS), aiming at the optimization of DSP implementations using modern Field Programmable Gate Array devices (FPGAs).

A multiple word-length approach (MWL) is adopted since it leads to highly optimized implementations. MWL implies the customization of the word-lengths of the signals of an algorithm. This complicates the design, since the number possible assignations between algorithm operations and hardware resources becomes very high. Moreover, this work also considers the use of heterogeneous FPGAs where there are several types of resources: configurable logic-based blocks (LUT-based) and specialized embedded resources. All these issues are addressed in this work and several automatic design techniques are proposed.

The contributions of the Thesis cover the fields of WLA, HLS using FPGAs, and the combined application of WLA and HLS for implementation in FPGAs.

A thorough approach of HLS has been implemented which considers a complete datap-

ath composed of functional units (FUs), registers and multiplexers, as well as heterogeneous FPGA resources (LUT-based and embedded resources). The approach makes use of a resource library that accounts for MWL effects within the set of resources, thus producing highly optimized architectures. This library includes both LUT-based and embedded FPGA resources, which further increase the power of the HLS task. Another important contribution is the introduction of resource usage metrics suitable for heterogeneous-architecture FPGAs.

A novel quantization error estimation based on affine arithmetic (AA) is presented, as well as its practical application to the automatic WLA of LTI and non-linear differentiable DSP systems. The error estimation is based on performing a pre-processing of the algorithm, which produces an expression of the quantization error at the system output. Therefore, the error can be easily computed leading to fast and accurate WLA optimizations.

The analysis of the impact of different optimization techniques during WLA on HLS results is also presented. The variance in the obtained results corroborates the fact that it is worth using a single architecture model during WLA and HLS, and this is only possible by means of combining these tasks.

The actual combination of WLA and HLS has been firstly performed by using a Mixed Integer Linear Programming (MILP) approach. The results prove the validity of the approach and also provide with insights into the combination of the two tasks that are used to generate heuristic synthesis algorithms.

Finally, the global contribution of this thesis is an HLS heuristic algorithm able to perform the combined WLA and HLS of DSP systems for both homogeneous and heterogeneous FPGA architectures. Up to 20% of resource usage reductions are reported, which proves the importance of such a combined approach, providing electronic designers with a design framework that enables highly improved DSP custom hardware implementations.