Student Interns from India: The internship application deadline is February 28, 2022. The applicants should mention a maximum of two projects from the list given below and contact the respective supervisors directly.

We are always looking for highly motivated researchers to join our team on topics of Approximate Computing, Reconfigurable Computing, Machine Learning Applications, Design Automation for Emerging Technologies, Reliability and Fault-Tolerance, and Embedded Systems, etc. So do not hesitate to contact us. Beside that, the following positions are also available:

Positions for Master Thesis, Master Project, SHW/WHK, Internship
Hardware assisted Federated Learning

Federated learning paradigm implies training on decentralized datasets. While several CPU/GPU-based servers can realize federated learning, a crucial requirement for IoT applications is to enable such a learning paradigm over decentralized hardware resources. The project requires exploring a hardware/software co-design approach towards federated learning. More information can be found in this Design and Test 2020 paper.

Pre-requisites:

  • C/C++
  • Python
  • familiarity with git

Contact:

Shubham Rai,

 

Exploring RL-based logic synthesis approaches

Within logic synthesis, most optimization scripts are well-defined heuristics that generalize over a variety of Boolean circuits. These heuristic-based scripts comprise various optimization algorithms that are applied sequentially in a specific order over a logic graph representation of Boolean circuits. In order to develop custom heuristics specific to a particular Boolean circuit that performs well, this project aims at utilizing reinforcement learning (RL) that should be able to generate scripts to carry out logic synthesis flows. More info on this can be found in ICCAD 2021 publication.

Pre-requisites:

  • C/C++
  • Python
  • familiarity with git

Contact:

Shubham Rai,

 

Side-channel Analysis in Reconfigurable Field-Effect Transistor circuits

The widespread use of electrical computing devices in many applications, including credit cards, smartphones, and autonomous cars, has significantly increased the importance of designing secure hardware. Hence, hardware manufacturers and designers spend a lot of time and money to ensure the security of their chips against various attacks.

One of the common and effective attacks is side-channel attacks in which the attacker exploits unintended information leakage of the chip, such as power consumption, timing delay, electromagnetic radiation, etc., to extract secret information such as encryption key or circuit function.

In recent years, emerging beyond CMOS technologies like spintronics and Reconfigurable Field-effect Transistors (RFETs) have provided unique abilities for securing hardware. This project aims to analyze the strengths and weaknesses of RFET circuits against traditional or ML-based side-channel attacks and propose efficient security solutions.

Pre-requisites:

  • VLSI design knowledge
  • SPICE and Cadence Virtuoso simulation
  • Familiarity with HW security concepts
  • Knowledge of Machine Learning (Favorable)
  • Verilog

Contacts:

Nima Kavand (nima.kavand@tu-dresden.de)

Armin Darjani (armin.darjani@tu-dresden.de)

Creating Physical LEF Library of RFET Standard Cells

The growth and development of new applications such as the Internet of Things and the need for more efficient and secure hardware on the one hand, and the limitations of CMOS technology, on the other hand, have led academia and industry to investigate emerging devices. Reconfigurable field-effect transistors (RFETs) are one of the most promising beyond CMOS technologies provides unique features for designing compact, low power, and secure circuits.

For the VLSI design, having a whole EDA flow and standard cell libraries is essential. This project aims to design RFET standard cells layout and generate a physical LEF library for the place and route phase.

Pre-requisites:

  • VLSI design knowledge
  • ASIC design flow
  • Cadence virtuoso and layout
  • Verilog

Contacts:

Nima Kavand (nima.kavand@tu-dresden.de)

Armin Darjani (armin.darjani@tu-dresden.de)

Thwarting ML-based attacks against post-SAT logic locking schemas

The ever-growing complexity of the Integrated Circuit (IC) has steeply increased the cost of IC manufacturing in recent years. This growth in expenses has propelled companies to go fabless over the years. As a result, the different parts of the IC manufacturing flow may be carried out by various entities from different regions in the globe. This outsourcing and globalization have brought so many perils regarding the integrity and confidentiality of intellectual properties (IPs). In a globalized IC supply chain, the valuable IP may be exposed to an untrusted entity in various forms, from a GDSII file to a packaged IC. Born out of this exposure, hardware security threats such as IP piracy, overbuilding, reverse engineering, and hardware Trojans challenge the IP designers.

Logic locking is one of the most promising hardware security techniques that can shield the hardware design from different hardware security threats throughout the IC supply chain. This technique locks the design by adding new logic elements to the circuit, hiding the true functionality of the design. The locked design only functions correctly upon receiving a true set of key bits stored in a tamper-proof on-chip memory.

However, years of research highlighted the flaws in this technique, which led to powerful ML-based attacks that harness the structural traces of the add-on logic locking techniques to make them futile.

The goals of this project are:

  • Analyzing the structural weaknesses of the latest broken logic locking techniques
  • Understanding the ML-based attacks that harness these weaknesses
  • Investigating new approaches that can reduce the accuracy of ML-based attacks

Pre-Requisites:

  • VLSI Circuit design Especially logical synthesis tools
  • Verilog/VHDL, Python
  • TensorFlow (favorable)
  • Knowledge about Machine Learning techniques (Knowledge about GNNs is favorable)

Contact information for more details:

Armin Darjani (armin.darjani@tu-dresden.de)

Nima Kavand (nima.kavand@tu-dresden.de)

Computationally efficient and robust deep learning models

As deep neural networks (DNNs) are evolving for solving complex real-world tasks, the need for computationally efficient algorithms is also growing to implement these networks in low powers devices.  MobileNets, Binary Neural Networks, and Quantized DNNs are some examples of computationally efficient lightweight models. In recent years, studies demonstrate that deep learning systems can be fooled by carefully engineered adversarial examples with small imperceptible perturbations. There are many attacks and defense schemes proposed in this direction. This project aims to explore and implement methods to make lightweight neural networks robust against adversarial attacks.

Pre-requisites:
- Knowledge of machine leanring and Neural network concepts
- Programming languages (C++, and Python)
- Machine learning frameworks (TensorFlow, Keras, PyTorch, and Scikit-Learn)
 
Contact information for more details:
 
Adversarial Example
 
Xintel: A framework for cross implementation of FPGA-optimized designs

Most state-of-the-art FPGA-based arithmetic modules (in particular approximate modules) have considered Xilinx 6-input LUT structure. The availability of the 6-input LUTs and associated carry chains in Xilinx FPGAs enable resource-efficient and reduced latency designs. However, due to architectural differences, these designs cannot be directly imported to FPGAs from other vendors, e.g., Intel FPGAs (the second-highest FPGA market shareholder after Xilinx). The good thing is that Intel FPGAs also support 6-input fracturable LUTs and associated adders.

This project aims to develop a generic framework that will act as a bridge between Xilinx and Intel FPGAs. The framework will receive a design optimized for Xilinx FPGA and produce a corresponding design optimized for Intel FPGAs by utilizing the Intel FPGA primitives and vice versa. The obtained optimized design will be compared with other state-of-the-art designs and the Vendor-provided IPs. The framework will enable the easy adaptation of designs optimized for one FPGA vendor to another FPGA vendor.

 

Pre-requisites:

  • Digital Design, Computer Architecture
  • Xilinx FPGA architecture and design with VHDL/Verilog
  • Experience with Xilinx Vivado
  • Knowledge about Intel FPGAs (recommended)
  • Knowledge about Intel Quartus (recommended)
  • Some scripting language (preferably Python), C++

Skills that will be acquired during project work:

  • FPGA Design tools (Xilinx and Intel)
  • Designing with low-level primitives (LUTs and carry chains)
  • Approximate Computing
  • Technical writing for research publications

Related Publications:

  • Ullah, Salim, Siva Satyendra Sahoo, Nemath Ahmed, Debabrata Chaudhury, and Akash Kumar. "AppAxO: Designing App lication-specific A ppro x imate O perators for FPGA-based Embedded Systems." ACM Transactions on Embedded Computing Systems (TECS) (2022).
  • Ullah, Salim, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman, and Akash Kumar. "Area-optimized accurate and approximate softcore signed multiplier architectures." IEEE Transactions on Computers 70, no. 3 (2020): 384-392.

Contact

Salim Ullah

Machine-Learning Techniques Analysis for Embedded Real-Time System Design

In general, there are three categories of ML techniques -- supervised-learning, unsupervised-learning, and reinforcement-learning -- where depending on the problem, parameters, and inputs, only some of these techniques are suitable and used for system properties optimization. These ML techniques are memory-intensive and computationally expensive, which makes some of them incompatible with real-time system design due to the overheads, which may cause an effect on applications' timeliness. Therefore, this project aims to analyze and investigate various ML techniques in terms of overheads, accuracy, and capability and determine the efficient ones suitable for embedded real-time systems.

Pre-Requisites and helpful skills:

  • Proficiency in C++, Python, Matlab
  • Knowledge about Machine Learning techniques
  • Good knowledge of computer architecture and algorithm design

Related Publications:

  • S. Pagani, P. D. S. Manoj, A. Jantsch and J. Henkel, "Machine Learning for Power, Energy, and Thermal Management on Multicore Processors: A Survey," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 39, no. 1, pp. 101-116, 2020.

Contact information for more details:

Prof. Akash Kumar (akash.kumar@tu-dresden.de)

Behnaz Ranjbar (behnaz.ranjbar@tu-dresden.de)

System-Level Reliability Analysis in Hardware and Software Abstraction Layers for MC Applications

A wide range of embedded systems found in the automotive and avionics industries are evolving into Mixed-Criticality (MC) systems to meet cost, space, timing, and power consumption requirements. In these MC systems, multiple tasks with different criticality levels are executed on common hardware. A failure occurring in tasks with different criticality levels has a disparate impact on the system, from no effect to catastrophic. These systems' functions, especially high-critical ones, must be ensured during their execution under various stresses (e.g., hardware errors, software errors, etc.) to prevent failure and catastrophic consequences. Therefore, to guarantee system safety, different reliability management techniques are employed to design such systems. In the case of fault occurrence, different techniques in different abstraction layers are needed to enhance their strengths against potential failures. Furthermore, due to the various safety demands for tasks, they can have different reliability requirements. This project aims to analyze the reliability management techniques across hardware and software layers of the system stack, which can be applied in MC applications, like automotive benchmarks. 

Pre-Requisites and helpful skills:

  • Proficiency in C++, Python, Matlab
  • Good knowledge of computer architecture and algorithm design
  • Strong architecture background with either general purpose multi-core platforms 

Related Publications:

  • Siva Satyendra Sahoo, Bharadwaj Veeravalli, Akash Kumar, "Cross-layer fault-tolerant design of real-time systems", In Proc. of International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), pp. 1–6, 2016.
  • Siva Satyendra Sahoo, Behnaz Ranjbar, Akash Kumar, "Reliability-Aware Resource Management in Multi-/Many-Core Systems: A Perspective Paper", In Journal of Low Power Electronics and Applications, MDPI AG, vol. 11, no. 1, pp. 7, 2021.

Contact information for more details:

Prof. Akash Kumar (akash.kumar@tu-dresden.de)

Behnaz Ranjbar (behnaz.ranjbar@tu-dresden.de)

Machine-Learning (ML)-Based WCET Analysis in Mixed-Criticality Applications

A wide range of embedded systems found in the automotive and avionics industries are evolving into Mixed-Criticality (MC) systems, where multiple tasks with different criticality levels are executed on common hardware. In such MC systems, multiple Worst-Case Execution Times (WCETs) are defined for each task, corresponding to system operation mode to improve the MC system’s timing behavior at run-time. Determining different WCETs is one of the effective solutions in designing MC systems to improve confidence and safety, utilize resource usage, and execute more tasks on a platform. However, appropriate WCETs determination for lower criticality modes (low WCET) is non-trivial. Considering a very low WCET for tasks can improve the processor utilization by scheduling more tasks in that mode; on the other hand, using a larger WCET ensures that the mode switches are minimized, thereby improving the quality-of-service (QoS) for all tasks, albeit at the cost of processor utilization. However, determining the static low WCETs for tasks cannot adapt to dynamism at run-time. In this regard, in addition to MC system design at design-time, the run-time behavior of tasks needs to be considered by using Machine-Learning (ML) techniques that dynamically monitor the tasks’ execution times and adapt the low WCETs to environmental changes.

This project aims to analyze the timing requirement of the MC systems at both design- and run-time by using ML techniques to improve the QoS and utilization. To achieve the project’s goal, several necessary hardware and software run-time controls may be used, and the MC applications are needed to be analyzed in terms of timing in all operation modes.

Pre-Requisites and helpful skills:

  • Proficiency in C++, Python, Matlab
  • Knowledge about Machine Learning techniques
  • Good knowledge of computer architecture and algorithm design
  • Strong architecture background with either general purpose multi-core platforms

Contact information for more details:

Prof. Akash Kumar (akash.kumar@tu-dresden.de)

Behnaz Ranjbar (behnaz.ranjbar@tu-dresden.de)

Cross-Layer Approximation of Neural Networks

In the upcoming era of Internet of Things (IoT), Federated Learning and Distributed Inference are visioned to be the pillars and key enablers for real-time processing. To enable such compute-intensive workload at the edge, the structure of NNs should be optimized without compromising the final quality of results. In this context, Approximate Computing techniques have shown to provide highly beneficial solutions by exploiting the inherent error resiliency of ML applications. Considering such potentials, the main idea in this project is to apply various approximations, efficiently, to reduce the area/power/energy of NNs and boost their performance.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information for more details

Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Exploration on In- Memory Computing (IMC) using Ferroelectric Reconfigurable Field Effect Transistors (FeRFET)

FeRFETs combine memory and storage to overcome the bottleneck of Von- Neumann Architecture and increase hardware security. The project involves exploration of IMC with FeRFET Technology and looking into its practical implementation in areas such as Neuromorphic Computing, Signal Processing, Hardware Security, etc. The major components of the project include:

  • Understanding FeRFET device structure and operation.
  • Logic Synthesis using FeRFETs based on device operation.
  • Investigating into design methodologies at different levels in relation to in-memory computing with FeRFETs.
  • Characterizing the FeRFET based circuits and comparing them with FeFET and RFET based design.
  • Explore logic cell design based on FeRFETs.
  • More info in our DATE-21 paper.

Pre-requisites:

  • VLSI Design,
  • Digital Design,
  • Verilog/ VHDL/ SystemC
  • Neural Networks (favorable, not compulsory)

Contact:

Shubham Rai

Neuromorphic Computing using State of the Art Components

From its birth in pure analog CMOS design, neuromorphic engineering has grown to include analog, digital and mixed implementations in various technologies and at varying levels of biological trueness; involving researchers from materials to computer sciences- all motivated by the common goal of achieving brain-like performance. This project involves designing analog memristor circuits for neuromorphic applications and exploring the Reconfigurable FET - Memristor design space for improved performance.

Goals of this project and Potential tasks:

  • Design of an analog CMOS Spiking Neuron
  • Design of memristive synapse
  • Expanding to multiple inputs and layers
  • Incorporating Reconfigurable FETs for improved performance

Prerequisites:

  • Analog and Digital VLSI Design
  • Spice tools

Contact:

Dr. Mark Wijtvliet and Shubham Rai

Light-Weight Accelerator for Anomaly Detection in Health-Monitoring Applications (ECG/EEG)

Motivated by the fact that ~36% of global death stems from heart anomalies (especially in Germany as one of the first EU-countries for high heart-related diseases), the market for health monitoring apps and gadgets is growing tremendously. Interesting feature of an ECG-monitoring algorithm is its high resiliency to errors and also the fact that full accuracy is needed 24/7 (e.g. in less-activate/sleeping periods). Considering such potentials, the main idea is to apply various approximations and also optimizations on the basic structure of ECG-monitoring application to improve its performance and reduce area/power/energy cost. Such light-weight health-monitoring algorithm can be broadly used by many companies as an app in smart phone/smart watch or even as an stand alone gadgets attached to the body.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information

Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Cross-layer Approximation for Embedded Machine Learning
  • Description: While Machine learning algorithms are being used for virtually every application, the high implementation costs of such algorithms still hinder their widespread use in resource-constrained Embedded systems. Approximate Computing (AxC) allows the designers to use low-energy (power and area) implementations with a slight degradation in results quality. Still better, Cross-layer Approximation (CLAx) offers the scope for much more improvements in power and energy reduction by using methods such as loop perforations, along with approximate hardware. Finding the proper combination of approximation techniques in hardware and software and across the layers of a DNN to provide just enough accuracy at the lowest cost poses an interesting research problem. In our research efforts towards solving this problem, we have implemented a DSE framework for 2D convolution. We would like to implement a similar framework for Convolution and Fully connected layers of a DNN.
  • Pre-requisites:
    • Digital Design, FPGA-based accelerator design, HLS
    • Python, C++/ SystemC
  • Skills that will be acquired during project-work:
    • Hardware design for ML
    • Multi-objective optimization of hardware accelerators.
    • System-level design
    • Technical writing for research publications.
  • Related Publications:
    • S. Ullah, S. S. Sahoo, A. Kumar, "CLAppED: A Design Framework for Implementing Cross-Layer Approximation in FPGA-based Embedded Systems" (to appear), In Proceeding: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1-6, Jul 2021.
    • Ullah, H. Schmidl, S. S. Sahoo, S. Rehman, A. Kumar, "Area-optimized Accurate and Approximate Softcore Signed Multiplier Architectures", In IEEE Transactions on Computers, April 2020.
    • Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo, Farhad Merchant, Akash Kumar, "ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems", 27 October 2020
  • Contact:
Design of AI/ML-based Biomedical Signal Processing Systems
  • Description: The range of applications that use AI/ML is increasing every day. The wide availability of medical data makes bio-medical systems a prime candidate for using machine learning. Paradigms such as online learning allow modern bio-medical systems to be customized for individual patients and are increasingly being used for monitoring systems. However, naïve implementations of ML algorithms can result in costly designs that can make such systems infeasible for wearables and similar battery-operated monitoring systems. This project involves a hardware-software co-design approach to implementing low-cost Signal processing for biomedical applications. Software techniques that explore algorithms, quantization, etc., and hardware techniques of approximate circuit design, ultra-low power RISC-V microarchitecture, low-energy accelerators, etc. will be explored in the project.
  • Pre-requisites:
    • Digital Design, Computer Architecture: RISC-V (preferred
    • FPGA architecture and design with VHDL/Verilog
    • Basic understanding of Signal Processing and Machine Learning
  • Skills that will be acquired during project-work:
    • RISC-V based SoC design
    • Accelerator design (HLS/HDL)
    • Bio-medical systems
    • Technical writing for research publications.
High-Speed Acceleration of Object Detection and Continuous Tracking Applications

Object detection and continuous tracking are ubiquitously used in a variety of applications, from Indoor Positioning and self-tracking of Unmanned Aerial Vehicles such as Drones to Surveillance and Biometric Security use-cases. An interesting example of which is human eye's Iris, it is the most reliable biometric after DNA and has an ever-increasing potential to be ubiquitously used in many biometric/AI marketing domains such as E-banking, attendance, or presence tracking in shops, airport, etc. In particular iris-scanning related market is expected to be doubled by 2024, worth $52 Billion. The basic structure of this application shows a degree of error resiliency. The common feature of all these applications is the error-resiliency of their algorithms to approximation techniques. Our aim in this topic is then, to utilize such potentials to enable real-time processing in energy-constrained IoT edge nodes.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information

Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Accelerator design for ML-based NLP
  • Description: There has been a recent push towards newer ML-based NLP models that can exploit the parallelism of accelerators. From bag-of-words to RNNs to LSTMs and most recently transformers, the models for NLP have evolved rapidly. In this project, we plan to explore the suitability of FPGA-based accelerators for modern NLP models. We aim to design NLP accelerators that exploit precision scaling and approximate computing.
  • Pre-requisites:
    • Digital Design, Computer Architecture
    • FPGA architecture and design with VHDL/Verilog, HLS knowledge preferable
    • Basic understanding of Machine Learning, specifically NLP
  • Skills that will be acquired during project-work:
    • Accelerator design (HLS/HDL)
    • Modern NLP algorithms and their implementation
    • Approximate Computing
    • Technical writing for research publications.
Implementing application-specific approximate computing with RISC-V (Currently not available)
  • Description: : The project involves implementing approximate arithmetic in RISC V-based application-specific system design. The major components of the project include:
    • Implementing custom RISC V implementations on an FPGA-based system
    • Integrating approximate components into standard RISC V microarchitectures
    • Familiarizing with the RISC V toolchain for enabling compilation for custom micro-architecture
    • Low-cost AI/ML accelerator design for RISC V SoC
    • Characterizing the microarchitecture for ASIC-based implementation (synthesis-only)
  • Pre-requisites:
    • Computer Architecture
    • Digital Design
    • Verilog/ VHDL/ SystemC)
    • Some scripting language (preferably Python)
  • Skills that will be acquired during project-work:
    • FPGA Design tools (Xilinx)
    • Extending RISC-V instructions
    • ASIC-based design
Using AI for Cyber-physical Systems (Currently not available)
  • Description: The project involves the exploration of the applicability of various Machine Learning methods in the optimization of the controller design for cyber-physical systems. A sample problem of controlling various actuators in an office-building environment for minimizing energy consumption and maximizing user-comfort will be used as a test-case for testing the performance of traditional, predictive, and self-learning algorithms.
  • Pre-requisites:
    • Knowledge of AI/ML methods and some background in control systems.
    • Python with ML tools (Scikit/Tensorflow/Pytorch/OpenAI)
  • Skills that will be acquired during project-work:
    • Design of cyber-physical systems
    • Application of AI/ML methods for dynamic systems
    • Hardware design and impact of accelerators on cyber-physical systems' performance.
  • Related Publications:
    • Akhil Raj Baranwal, Salim Ullah, Siva Satyendra Sahoo, Akash Kumar, "ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Institute of Electrical and Electronics Engineers (IEEE), pp. 1–1, 28 October 2020.
    • Siva Satyendra Sahoo, Akhil Raj Baranwal, Salim Ullah, Akash Kumar, "MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems" (to appear), Proceedings of the 2021 on Great Lakes Symposium on VLSI, Association for Computing Machinery, New York, NY, USA, July 2021.
FPGA-based Artificial Neural Network Accelerator (Currently not available)
CIFAR-10 Dataset for Imagen Classification

Goals of this project and Potential tasks

  • Literature survey on the existing FPGA-based hardware artificial neural network (ANN) accelerators
  • Implement a small Convolution Neural Network (CNN)with different design trade-offs on the FPGA
  • Providing the opportunity to use different approximate arithmetic units for energy efficiency

Skills acquired in this project

  • Hands-on experiences with FPGA-based development
  • Hands-on experience with ANNs
  • Advanced technical report writing

Pre-requisite

  • Digital design with VHDL/Verilog
  • Knowledge about ANNs
  • Knowledge about FPGA architecture
  • C/C++, Python

Helpful Skills

  • Knowledge about TCL script to automate the design steps in Xilinx Vivado
  • Work independently

Contact Information

Reconfigurable Hard Logic Design

The significant increase of static power in nano-CMOS era, has put a Power Wall to the aggressive growth of transistor density in a single die. This issue referred o as the Era of Dark Silicon, as not all of transistors can be utilized at the same time. In particular, the significant traction toward edge-computing highly encourages using small-scale and energy-efficient IoT devices. In this context, cutting-edge trends have shown great resource/performance improvement can be achieved through simple Reconfigurable Hard Logic (RHL) cells. Such cells are designed to be replaced with energy-hungry SRAM-based Look-Up-Tables (LUTs) in platforms like FPGAs.

Pre-Requisites and helpful skills: Software programming (Java/C++/Python/Matlab)

Tool used for the project: Verilog to Routing (VTR)

Contact Information: Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Exploring FPGA Architectures and Mapping approaches
A logic element based on And_inverter Cones

Look-up-Tables based FPGAs suffer when it comes to scaling, as their complexity increases exponentially with the increase in the number of inputs. Due to this reason, LUTs with more than 6 inputs have rarely been used. In an attempt to handle more inputs and increase the logical density of logic cells, {And-Inverter Cones (AICs)}, shown in figure below were proposed. These are an alternative for LUTs with a better compromise between hardware complexity, flexibility, delay and input and output counts. These are inspired by modern logic synthesis approaches which employ and-inverter graphs (AIGs) for representing logic networks. AIC is a binary tree which consists of AND nodes with programmable conditional inversion and offers tapping of intermediate results. AICs have a lot to offer as compared to the traditional LUT based FPGAs. The following points summarize the major benefits of using AICs over LUTs:

  1. For a given complexity, AICs can implement a function with more number of inputs compared to an LUT.
  2. Since it is inspired by AIGs, area and delay increase linearly and logarithmically respectively with the number of inputs which is in contrast to the respective exponential and linear increase in case of LUTs.
  3. Intermediate results can be tapped out of AICs thereby reducing logic duplication. 

While on one hand, we are sacrificing on the flexibility offered by FPGAs, there are certain new nanotechnologies based on materials like germanium and silicon which offers runtime-reconfigurability and functional symmetry between p and n-type behavior. The project aims to explore the FPGA architecture using these reconfigurable nanotechnologies in the context of AICs. More information on our recent DATE 2022

Skills acquired in this thesis:

  • Hands-on skills using Linux based systems
  • Programming in Python or C/C++
  • Working with tools like Cadence virtuoso environment and open source VTR (verilog-to-routing) tool for FPGAs
  • Problem analysis
  • Working in an international environment and communicating in English
  • Professional technical writing
  • Verilog/VHDL

Pre-Requisites:

  • Knowledge of FPGAs
  • Familiar with Linux environment, C or C++.

Contact Information:

Customizing Approximate Arithmetic Blocks for FPGA (Currently not available)

 

Approximate Computing has emerged as a new paradigm for building highly power-efficient on-chip systems. The implicit assumption of most of the standard low-power techniques was based on precise computing, i.e., the underlying hardware provides accurate results. However, continuing to support precise computing is most likely not a way to solve upcoming power-efficiency challenges. Approximate Computing relaxes the bounds of precise computation, thereby providing new opportunities for power savings and may bear orders of magnitude in performance/power benefits. Recent research studies by several companies (like Intel, IBM, and Microsoft), and research groups have demonstrated that applying hardware approximations may provide 4x-60x energy reductions. These research studies have shown that there is a large body of power-hungry applications from several domains like image and video processing, computer vision, Big Data, and Recognition, Mining and Synthesis (RMS), which are amenable to approximate computing due to their inherent resilience to approximation errors and can still produce output of acceptable quality. State-of-the-art has developed approximate hardware designs to perform a computation approximation, with certain basic hardware blocks. For example, approximate designs only for the Ripple Carry Adders which has higher potential for the approximation, but ignoring the other types of widely used adders like: Kogge Stone adder, Carry look ahead, and Carry Sum, Carry Save adder.

 

Goals of this Thesis and Potential Tasks (Contact for more Discussion):

  • Developing an approximate FPGA-specific library for different arithmetic modules like adders, multipliers, dividers, and logical operations.
  • Developing complex multi-bit approximate functions and accelerators for FPGAs.
  • Interfacing custom instructions and FPGAs to soft cores (e.g. Microblaze) and using SDSoC Framework.
  • Developing functionally equivalent software models, e.g., using C or C++ and testing in different benchmark applications.
  • Open-sourcing and Documentation.

 

Skills acquired in this Thesis:

  • Hands-on experience on FPGA development and new SDSoC framework.
  • Computer Arithmetic and Hardware Optimization.
  • In-depth technical knowledge on the cutting-edge research topic and emerging computing paradigms.
  • Problem analysis and exploration of novel solutions for real-world problems.
  • Open-Sourcing.
  • Team work and experience in an international environment.
  • Professional grade technical writing.

 

Pre-Requisite (But not fully required!):

  • Python-based programming
  • Knowledge of Computer architecture, C or C++ or MATLAB.
  • VHDL programming (beneficial if known and practiced in some labs)

 

Contact information:

Coffee Machine Usage Automatic Logging Device with Person Detection/Recognition
An example of the coffee machine usage logging with the inefficient paper/pen and the smart device with the LCD for display and Camera + Microphone for Person Detection/Recognition through image or voice

In our lab, the usage of the coffee machine is done by using the very inefficient paper-and-pen method. When one particular user takes one coffee from the machine, he/she needs to look for his name on the paper sheet, and make a tick accordingly. At the end of a quarter, one person in charge must take the sheet, calculate the total cost for buying coffee beans/milk divided by the total number of cups taken by everybody. The contribution by everybody will be proportional to the number of cups he/she took.

In this project, we would like to have a smart device to take care of this tedious task. It can recognize that somebody is taking the coffee by doing person detection. After that, it asks for the permission to recognize the person through the face by using the camera or through the voice by using the microphone. If that person does not agree to do so, then ask him/her to speak his/her name. If the information associated to that person cannot be found, then the device asks for more information to store into the database. The person in charge of buying the coffee beans and milk needs to have a separate login portal to log the money spent and to issue the command to calculate the contribution from the users. All of these processing must be done locally on the device. The device communicates with the users through the touch LCD screen. In our labs, there are two coffee machines in two different rooms, and the users can use any of the coffee machine they want. Therefore, there will be two such devices in two rooms, one will act as a server to store all of the data to synchronize the usage of two coffee machines. These two boards will communicate through wifi.


Goals of this project and Potential tasks

The project covers different levels of system design in which the student has to write the user interface to interact with the users, implementing simple database backend framework for two boards to access to store/retrieve the users’ data and expenses, implementing person detection, face recognition and voice recognition algorithms using machine learning. These algorithms can be implemented on FPGA or with the external AI inference device (Intel Movidius stick, Google Coral, etc.) depending on the background of the student.


Phase 1: 3-4 months

Implement the core functions of the system: person detection, face recognition and voice recognition algorithms using machine learning


Phase 2: 2 months

Interface with the peripherals: camera + microphone + LCD touch screen. Implement the backend database for two board to have access to. Design the user interface.

Skills acquired in this project

  • Hands-on experiences with embedded system development and interfacing with peripherals such as monitor, camera and microphone and AI inference device.
  • Hands-on experiences with designing embedded system with hardware/software co-design analysis for performance and energy efficiency (when the external device is needed for AI inference)
  • Advanced technical report writing


Pre-requisite

  • C/C++, Python
  • FPGA development (if want to work with FPGA)


Helpful Skills

  • Knowledge about image processing algorithms.
  • Knowledge about machine learning
  • Knowledge about database, web design
  • Knowledge of computer architecture
  • Work independently


Contact Information