We are always looking for highly motivated researchers to join our team on topics of Approximate Computing, Reconfigurable Computing, Machine Learning Applications, Design Automation for Emerging Technologies, Reliability and Fault-Tolerance, and Embedded Systems, etc. So do not hesitate to contact us. Beside that, the following positions are also available:

 

Positions for Master Thesis, Master Project, SHW/WHK, Internship

 

Cross-layer Approximation for Embedded Machine Learning
  • Description: While Machine learning algorithms are being used for virtually every application, the high implementation costs of such algorithms still hinder their widespread use in resource-constrained Embedded systems. Approximate Computing (AxC) allows the designers to use low-energy (power and area) implementations with a slight degradation in results quality. Still better, Cross-layer Approximation (CLAx) offers the scope for much more improvements in power and energy reduction by using methods such as loop perforations, along with approximate hardware. Finding the proper combination of approximation techniques in hardware and software and across the layers of a DNN to provide just enough accuracy at the lowest cost poses an interesting research problem. In our research efforts towards solving this problem, we have implemented a DSE framework for 2D convolution. We would like to implement a similar framework for Convolution and Fully connected layers of a DNN.
  • Pre-requisites:
    • Digital Design, FPGA-based accelerator design, HLS
    • Python, C++/ SystemC
  • Skills that will be acquired during project-work:
    • Hardware design for ML
    • Multi-objective optimization of hardware accelerators.
    • System-level design
    • Technical writing for research publications.
  • Related Publications:
    • S. Ullah, S. S. Sahoo, A. Kumar, "CLAppED: A Design Framework for Implementing Cross-Layer Approximation in FPGA-based Embedded Systems" (to appear), In Proceeding: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1-6, Jul 2021.
    • Ullah, H. Schmidl, S. S. Sahoo, S. Rehman, A. Kumar, "Area-optimized Accurate and Approximate Softcore Signed Multiplier Architectures", In IEEE Transactions on Computers, April 2020.
    • Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo, Farhad Merchant, Akash Kumar, "ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems", 27 October 2020
  • Contact:
Design of AI/ML-based Biomedical Signal Processing Systems
  • Description: The range of applications that use AI/ML is increasing every day. The wide availability of medical data makes bio-medical systems a prime candidate for using machine learning. Paradigms such as online learning allow modern bio-medical systems to be customized for individual patients and are increasingly being used for monitoring systems. However, naïve implementations of ML algorithms can result in costly designs that can make such systems infeasible for wearables and similar battery-operated monitoring systems. This project involves a hardware-software co-design approach to implementing low-cost Signal processing for biomedical applications. Software techniques that explore algorithms, quantization, etc., and hardware techniques of approximate circuit design, ultra-low power RISC-V microarchitecture, low-energy accelerators, etc. will be explored in the project.
  • Pre-requisites:
    • Digital Design, Computer Architecture: RISC-V (preferred
    • FPGA architecture and design with VHDL/Verilog
    • Basic understanding of Signal Processing and Machine Learning
  • Skills that will be acquired during project-work:
    • RISC-V based SoC design
    • Accelerator design (HLS/HDL)
    • Bio-medical systems
    • Technical writing for research publications.
Accelerator design for ML-based NLP
  • Description: There has been a recent push towards newer ML-based NLP models that can exploit the parallelism of accelerators. From bag-of-words to RNNs to LSTMs and most recently transformers, the models for NLP have evolved rapidly. In this project, we plan to explore the suitability of FPGA-based accelerators for modern NLP models. We aim to design NLP accelerators that exploit precision scaling and approximate computing.
  • Pre-requisites:
    • Digital Design, Computer Architecture
    • FPGA architecture and design with VHDL/Verilog, HLS knowledge preferable
    • Basic understanding of Machine Learning, specifically NLP
  • Skills that will be acquired during project-work:
    • Accelerator design (HLS/HDL)
    • Modern NLP algorithms and their implementation
    • Approximate Computing
    • Technical writing for research publications.
Implementing application-specific approximate computing with RISC-V
  • Description: : The project involves implementing approximate arithmetic in RISC V-based application-specific system design. The major components of the project include:
    • Implementing custom RISC V implementations on an FPGA-based system
    • Integrating approximate components into standard RISC V microarchitectures
    • Familiarizing with the RISC V toolchain for enabling compilation for custom micro-architecture
    • Low-cost AI/ML accelerator design for RISC V SoC
    • Characterizing the microarchitecture for ASIC-based implementation (synthesis-only)
  • Pre-requisites:
    • Computer Architecture
    • Digital Design
    • Verilog/ VHDL/ SystemC)
    • Some scripting language (preferably Python)
  • Skills that will be acquired during project-work:
    • FPGA Design tools (Xilinx)
    • Extending RISC-V instructions
    • ASIC-based design
Using AI for Cyber-physical Systems
  • Description: The project involves the exploration of the applicability of various Machine Learning methods in the optimization of the controller design for cyber-physical systems. A sample problem of controlling various actuators in an office-building environment for minimizing energy consumption and maximizing user-comfort will be used as a test-case for testing the performance of traditional, predictive, and self-learning algorithms.
  • Pre-requisites:
    • Knowledge of AI/ML methods and some background in control systems.
    • Python with ML tools (Scikit/Tensorflow/Pytorch/OpenAI)
  • Skills that will be acquired during project-work:
    • Design of cyber-physical systems
    • Application of AI/ML methods for dynamic systems
    • Hardware design and impact of accelerators on cyber-physical systems' performance.
  • Related Publications:
    • Akhil Raj Baranwal, Salim Ullah, Siva Satyendra Sahoo, Akash Kumar, "ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Institute of Electrical and Electronics Engineers (IEEE), pp. 1–1, 28 October 2020.
    • Siva Satyendra Sahoo, Akhil Raj Baranwal, Salim Ullah, Akash Kumar, "MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems" (to appear), Proceedings of the 2021 on Great Lakes Symposium on VLSI, Association for Computing Machinery, New York, NY, USA, July 2021.
FPGA-based Artificial Neural Network Accelerator
CIFAR-10 Dataset for Imagen Classification

Goals of this project and Potential tasks

  • Literature survey on the existing FPGA-based hardware artificial neural network (ANN) accelerators
  • Implement a small Convolution Neural Network (CNN)with different design trade-offs on the FPGA
  • Providing the opportunity to use different approximate arithmetic units for energy efficiency

Skills acquired in this project

  • Hands-on experiences with FPGA-based development
  • Hands-on experience with ANNs
  • Advanced technical report writing

Pre-requisite

  • Digital design with VHDL/Verilog
  • Knowledge about ANNs
  • Knowledge about FPGA architecture
  • C/C++, Python

Helpful Skills

  • Knowledge about TCL script to automate the design steps in Xilinx Vivado
  • Work independently

Contact Information

High-Speed Acceleration of Object Detection and Continuous Tracking Applications

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information for more details

Light-Weight Accelerator for Anomaly Detection in Health-Monitoring Applications (ECG/EEG)

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information for more details

Cross-Layer Approximation of Neural Networks

In the upcoming era of Internet of Things (IoT), Federated Learning and Distributed Inference are visioned to be the pillars and key enablers for real-time processing. To enable such compute-intensive workload at the edge, the structure of NNs should be optimized without compromising the final quality of results. In this context, Approximate Computing techniques have shown to provide highly beneficial solutions by exploiting the inherent error resiliency of ML applications. Considering such potentials, the main idea in this project is to apply various approximations, efficiently, to reduce the area/power/energy of NNs and boost their performance.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information for more details

Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Reconfigurable Hard Logic Design

Pre-Requisites and helpful skills: Software programming (Java/C++/Python/Matlab)

Tool used for the project: Verilog to Routing (VTR)

More details: Zahra Ebrahimi (ebrahimi@tu-dresden.de)

Sacrificing flexibility in FPGAs: FPGA Architecture based on AND-Inverter Cones
A logic element based on And_inverter Cones

Look-up-Tables based FPGAs suffer when it comes to scaling, as their complexity increases exponentially with the increase in the number of inputs. Due to this reason, LUTs with more than 6 inputs have rarely been used. In an attempt to handle more inputs and increase the logical density of logic cells, {And-Inverter Cones (AICs)}, shown in figure below were proposed. These are an alternative for LUTs with a better compromise between hardware complexity, flexibility, delay and input and output counts. These are inspired by modern logic synthesis approaches which employ and-inverter graphs (AIGs) for representing logic networks. AIC is a binary tree which consists of AND nodes with programmable conditional inversion and offers tapping of intermediate results. AICs have a lot to offer as compared to the traditional LUT based FPGAs. The following points summarize the major benefits of using AICs over LUTs:

  1. For a given complexity, AICs can implement a function with more number of inputs compared to an LUT.
  2. Since it is inspired by AIGs, area and delay increase linearly and logarithmically respectively with the number of inputs which is in contrast to the respective exponential and linear increase in case of LUTs.
  3. Intermediate results can be tapped out of AICs thereby reducing logic duplication. 

While on one hand, we are sacrificing on the flexibility offered by FPGAs, there are certain new nanotechnologies based on materials like germanium and silicon which offers runtime-reconfigurability and functional symmetry between p and n-type behavior. The project aims to explore the FPGA architecture using these reconfigurable nanotechnologies in the context of AICs.

Skills acquired in this thesis:

  • Hands-on skills using Linux based systems
  • Programming in Python or C/C++
  • Working with tools like Cadence virtuoso environment and open source VTR (verilog-to-routing) tool for FPGAs
  • Problem analysis
  • Working in an international environment and communicating in English
  • Professional technical writing
  • Verilog/VHDL

Pre-Requisites:

  • Knowledge of FPGAs
  • Familiar with Linux environment, C or C++.

Contact Information:

Customizing Approximate Arithmetic Blocks for FPGA

 

Approximate Computing has emerged as a new paradigm for building highly power-efficient on-chip systems. The implicit assumption of most of the standard low-power techniques was based on precise computing, i.e., the underlying hardware provides accurate results. However, continuing to support precise computing is most likely not a way to solve upcoming power-efficiency challenges. Approximate Computing relaxes the bounds of precise computation, thereby providing new opportunities for power savings and may bear orders of magnitude in performance/power benefits. Recent research studies by several companies (like Intel, IBM, and Microsoft), and research groups have demonstrated that applying hardware approximations may provide 4x-60x energy reductions. These research studies have shown that there is a large body of power-hungry applications from several domains like image and video processing, computer vision, Big Data, and Recognition, Mining and Synthesis (RMS), which are amenable to approximate computing due to their inherent resilience to approximation errors and can still produce output of acceptable quality. State-of-the-art has developed approximate hardware designs to perform a computation approximation, with certain basic hardware blocks. For example, approximate designs only for the Ripple Carry Adders which has higher potential for the approximation, but ignoring the other types of widely used adders like: Kogge Stone adder, Carry look ahead, and Carry Sum, Carry Save adder.

 

 

Goals of this Thesis and Potential Tasks (Contact for more Discussion):

  • Developing an approximate FPGA-specific library for different arithmetic modules like adders, multipliers, dividers, and logical operations.
  • Developing complex multi-bit approximate functions and accelerators for FPGAs.
  • Interfacing custom instructions and FPGAs to soft cores (e.g. Microblaze) and using SDSoC Framework.
  • Developing functionally equivalent software models, e.g., using C or C++ and testing in different benchmark applications.
  • Open-sourcing and Documentation.

 

Skills acquired in this Thesis:

  • Hands-on experience on FPGA development and new SDSoC framework.
  • Computer Arithmetic and Hardware Optimization.
  • In-depth technical knowledge on the cutting-edge research topic and emerging computing paradigms.
  • Problem analysis and exploration of novel solutions for real-world problems.
  • Open-Sourcing.
  • Team work and experience in an international environment.
  • Professional grade technical writing.

 

Pre-Requisite (But not fully required!):

  • Python-based programming
  • Knowledge of Computer architecture, C or C++ or MATLAB.
  • VHDL programming (beneficial if known and practiced in some labs)

 

Contact information: