Postdoc/PhD

The following Postdoc/Ph.D. positions are available:

Electronic design automation for designing secure circuits

At the Institute of Computer Engineering, the Chair of Processor Design offers, subject to the availability of resources, two fixed-term positions as

Research Associate / Ph.D. Student
(subject to personal qualification employees are remunerated according to salary group E 13 TV-L)

starting as soon as possible.

Research area Electronic design automation for designing secure circuits
Terms: limited to February 29, 2024
The period of employment is governed by the Fixed Term Research Contracts Act (Wissenschaftszeitvertragsgesetz – WissZeitVG). The position offers the chance to obtain further academic qualifications (e.g. Ph.D.).

(Download Details)

Position
At the Chair of Processor Design, we have the long-term vision of shaping the way future electronic systems are to be designed. Today’s societies critically depend on electronic systems. Over the last years, the security of these systems has been at risk by a number of hardware-level attacks that circumvent software-level security mechanisms. Solutions based on classical CMOS electronics have been shown to be either cost-intensive due to a high area overhead or energy inefficient. One promising alternative against such hardware-level attacks is security primitives based on emerging reconfigurable nanotechnologies. Transistors based on these disruptive reconfigurable nanotechnologies, termed as Reconfigurable Field-Effect Transistors (RFETs), offer programmable p- and n-type behavior from a single device. The runtime-reconfigurable nature of these nano-electronic devices yields an inherent polymorphic functionality at the logical abstraction. As a result, circuits made of regular RFET blocks are able to provide a large number of possible functional combinations based on the apparently same circuit representation. The manufacturers, therefore, are able to program the desired functionality after chip production. The big difference to standard CMOS electronics is, that the actual circuit or function remains hidden since they cannot be differentiated from other possible combinations by physical reverse engineering. In this project, we will design the EDA flow to enable the co-integration of CMOS and RFET transistors. In particular, tools for logic and physical synthesis will be developed.

Tasks:

  • Getting a deep understanding of RFETs and the overall EDA flow,
  • Creating a co-integration EDA flow for RFET / CMOS circuits with Genlib and Lef libraries,
  • Developing logical and physical synthesis flow of RFET / CMOS circuits,
  • Designing algorithms for optimal placement of RFET cells,
  • Publishing the works in international conferences and/or journals.

Requirements:

  • A university degree in computer science or electrical engineering,
  • A deep understanding of the EDA flow from design specification to place and route,
  • Strong background in HDL either Verilog or VHDL,
  • Understanding of security principles will be an added advantage,
  • Understanding of reconfigurable circuits will be an added advantage,
  • Good communication skills,
  • Fluency in English - written and oral.

 

What we offer

You will join a team of enthusiastic researchers who pursue creatively their individual research agenda. Other ongoing projects at the Chair of Processor Design can be found at https://www.cfaed.tu-dresden.de/pd-about. The chair is a part of the “Center for Advancing Electronics Dresden”, which offers plenty of resources and structures for career development.

 

Informal inquiries can be submitted to Prof. Dr. Akash Kumar, Tel +49 (351) 463 39274; Email: akash.kumar@tu-dresden.de

Applications from women are particularly welcome. The same applies to people with disabilities.

 

Application Procedure

Please submit your comprehensive application (in English only) including the following: motivation letter, CV, copy of degree certificate, transcript of grades (i.e. the official list of coursework including your grades) and proof of English language skills preferably via the TU Dresden SecureMail Portal https://securemail.tu-dresden.de by sending it as a single pdf document quoting the reference number PhD21-02-PD in the subject header to recruiting.cfaed@tu-dresden.de or by post to: TU Dresden, Fakultät Informatik, Institut für Technische Informatik, Professur für Prozessorentwurf, Prof. Akash Kumar, Helmholtzstr. 10, 01069 Dresden, Germany. The closing date for applications is February 24, 2021 (stamped arrival date of the university central mail service applies). Please submit copies only, as your application will not be returned to you. Expenses incurred in attending interviews cannot be reimbursed.

 

Mixed criticality multi-core computer architecture

At the Institute of Computer Engineering the Chair of Processor Design offers, subject to the availability of resources, a position as

Research Associate

(subject to personal qualification employees are remunerated according to salary group E 13 TV-L)

 

starting March 1, 2021.

Research Area Mixed criticality multi-core computer architecture
Terms 75% of the full-time weekly hours, the position is limited to Decem-ber 31, 2022 (with the option to be extended). The period of employment is governed by §2 (2) Fixed Term Research Con-tracts Act (Wissenschaftszeitvertragsgesetz – WissZeitVG).

 

(Download Details)

Position

At the Chair of Processor Design, we have the long-term vision of shaping the way future electronic systems are to be designed. A wide range of embedded multi-core systems found in the automotive and avionics industries are evolving into Mixed-Criticality (MC) systems to meet cost, space, timing, and power consumption requirements. Escalating power densities have led to thermal issues, thereby leading to safety and reliability concerns for these systems. In this project, the timing and power requirements of the MC systems at both design- and run-time will be analyzed to ensure no thermal emergencies occur in the system. In addition, the thermal management of multi-core mixed-criticality systems will be studied, while the safety and reliability of the systems are guaranteed under any circumstances. Some aspects of the proposed approach are system monitoring from the perspective of temperature, reliability, and real-timeliness at runtime, investigation of hardware aspects of the systems, and issues related to modeling and tool support. In this project, the system-level methods on multi-core processors will be used to enable energy-efficient techniques through system design and also, analysis and optimization of real-time capabilities, power, temperature, safety, and reliability with regard to application demands in any situation. The proposed approach can be customized for different safety applications and target platforms. The proposed project will focus on requirements derived from the automotive, aerospace, and telecommunications and evaluate the effectiveness of the approach in these domains.

Tasks:

  • Studying the architecture requirements for the target domains to drive and use one of the suitable actual automotive-grade platforms. It involves using the low-power techniques, knowing about what affects the cores’ temperature, and how to run the applications on the target platform.
  • Deriving a list of mixed-criticality applications that are relevant for the automotive industry and profiling them. The models of these applications will be specified in detail on a real platform.
  • Designing, developing, and implementing the process analysis system and methods by using the necessary hardware and software run-time control. The proposed method must be customized for different safety applications and target platforms.

Requirements:

  • a university degree in computer science or electrical engineering;
  • strong architecture background with general-purpose multi-core platforms;
  • proficiency in C/C++ and Python;
  • good knowledge of Computer Architecture and algorithm design;
  • good publication record and good communication skills;
  • fluency in English - written and oral.

What we offer

You will join a team of enthusiastic researchers who pursue creatively their individual research agenda. Other ongoing projects at the Chair of Processor Design can be found at https://www.cfaed.tu-dresden.de/pd-about. The chair is a part of the “Center for Advancing Electronics Dresden”, which offers plenty of resources and structures for career development.

Informal inquiries can be submitted to Prof. Dr. Akash Kumar, Tel +49 (351) 463 39274; Email: akash.kumar@tu-dresden.de

Applications from women are particularly welcome. The same applies to people with disabilities.

Application Procedure

Please submit your comprehensive application (in English only) including the following: motivation letter, CV, copy of degree certificate, transcript of grades (i.e. the official list of coursework including your grades) and proof of English language skills preferably via the TU Dresden SecureMail Portal https://securemail.tu-dresden.de by sending it as a single pdf document quoting the reference number PhD21-01-PD in the subject header to recruiting.cfaed@tu-dresden.de or by post to: TU Dresden, Fakultät Informatik, Institut für Technische Informatik, Professur für Prozessorentwurf, Prof. Akash Kumar, Helmholtzstr. 10, 01069 Dresden, Germany. The closing date for applications is January 15, 2021 (stamped arrival date of the university central mail service applies). Please submit copies only, as your application will not be returned to you. Expenses incurred in attending interviews cannot be reimbursed.

 



Reference to data protection: Your data protection rights, the purpose for which your data will be processed, as well as further information about data protection is available to you on the website: https://tu-dresden.de/karriere/datenschutzhinweis

-Diplomarbeit Positions

The following projects (Diplom-, Master- und Studienarbeiten) are available:

Cross-layer Approximation for Embedded Machine Learning
  • Description: While Machine learning algorithms are being used for virtually every application, the high implementation costs of such algorithms still hinder their widespread use in resource-constrained Embedded systems. Approximate Computing (AxC) allows the designers to use low-energy (power and area) implementations with a slight degradation in results quality. Still better, Cross-layer Approximation (CLAx) offers the scope for much more improvements in power and energy reduction by using methods such as loop perforations, along with approximate hardware. Finding the proper combination of approximation techniques in hardware and software and across the layers of a DNN to provide just enough accuracy at the lowest cost poses an interesting research problem. In our research efforts towards solving this problem, we have implemented a DSE framework for 2D convolution. We would like to implement a similar framework for Convolution and Fully connected layers of a DNN.
  • Pre-requisites:
    • Digital Design, FPGA-based accelerator design, HLS
    • Python, C++/ SystemC
  • Skills that will be acquired during project-work:
    • Hardware design for ML
    • Multi-objective optimization of hardware accelerators.
    • System-level design
    • Technical writing for research publications.
  • Related Publications:
    • DSE for implementing Cross-layer Approximation. (Submitted for double-blind review)
    • Ullah, H. Schmidl, S. S. Sahoo, S. Rehman, A. Kumar, "Area-optimized Accurate and Approximate Softcore Signed Multiplier Architectures", In IEEE Transactions on Computers, April 2020.
    • Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo, Farhad Merchant, Akash Kumar, "ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems", 27 October 2020
  • Contact:
AI for healthcare: A hardware-software co-design approach
  • Description: The Covid-19 situation around the world has shown the extreme shortage and the vulnerability of man-power and existing diagnostic methods in current healthcare systems. Artificial intelligence (AI) presents scope for improving the healthcare system by augmenting the diagnosis stage and thereby helping radiologists and pathologists in providing timely feedback. This project explores the impact of a hardware/software co-design approach to improve the performance of AI systems to increase their effectiveness in terms of saving precious time for critical patients.
  • Pre-requisites:
    • AI/ML algorithms (Signal processing)
    • ML framework: Tensorflow/PyTorch/Scipy
    • Hardware Design, Computer Architecture
  • Skills that will be acquired during project-work:
    • Hardware/software co-design for critical systems
    • Exploration of novel AI algorithms for healthcare
    • Computer Vision
    • Hardware Accelerator design
    • Technical writing for research publications.
Design of AI/ML-based Biomedical Signal Processing Systems
  • Description: The range of applications that use AI/ML is increasing every day. The wide availability of medical data makes bio-medical systems a prime candidate for using machine learning. Paradigms such as online learning allow modern bio-medical systems to be customized for individual patients and are increasingly being used for monitoring systems. However, naïve implementations of ML algorithms can result in costly designs that can make such systems infeasible for wearables and similar battery-operated monitoring systems. This project involves a hardware-software co-design approach to implementing low-cost Signal processing for biomedical applications. Software techniques that explore algorithms, quantization, etc., and hardware techniques of approximate circuit design, ultra-low power RISC-V microarchitecture, low-energy accelerators, etc. will be explored in the project.
  • Pre-requisites:
    • Digital Design, Computer Architecture: RISC-V (preferred
    • FPGA architecture and design with VHDL/Verilog
    • Basic understanding of Signal Processing and Machine Learning
  • Skills that will be acquired during project-work:
    • RISC-V based SoC design
    • Accelerator design (HLS/HDL)
    • Bio-medical systems
    • Technical writing for research publications.
Accelerator design for ML-based NLP
  • Description: There has been a recent push towards newer ML-based NLP models that can exploit the parallelism of accelerators. From bag-of-words to RNNs to LSTMs and most recently transformers, the models for NLP have evolved rapidly. In this project, we plan to explore the suitability of FPGA-based accelerators for modern NLP models. We aim to design NLP accelerators that exploit precision scaling and approximate computing.
  • Pre-requisites:
    • Digital Design, Computer Architecture
    • FPGA architecture and design with VHDL/Verilog, HLS knowledge preferable
    • Basic understanding of Machine Learning, specifically NLP
  • Skills that will be acquired during project-work:
    • Accelerator design (HLS/HDL)
    • Modern NLP algorithms and their implementation
    • Approximate Computing
    • Technical writing for research publications.
Reconfigurable Architecture design for Emerging Technologies
  • Description: Design of novel application/domain-specific reconfigurable architectures for implementing emerging applications. the project also explores the impact of emerging devices in this perspective.
  • Pre-requisites:
    • Computer Architecture
    • FPGA architecture
    • Hardware Design (Verilog)
    • Python
  • Skills that will be acquired during project-work:
    • Reconfigurable System Design
    • Logic Synthesis
    • FPGA design algorithms (VPR)
Implementing application-specific approximate computing with RISC-V
  • Description: : The project involves implementing approximate arithmetic in RISC V-based application-specific system design. The major components of the project include:
    • Implementing custom RISC V implementations on an FPGA-based system
    • Integrating approximate components into standard RISC V microarchitectures
    • Familiarizing with the RISC V toolchain for enabling compilation for custom micro-architecture
    • Low-cost AI/ML accelerator design for RISC V SoC
    • Characterizing the microarchitecture for ASIC-based implementation (synthesis-only)
  • Pre-requisites:
    • Computer Architecture
    • Digital Design
    • Verilog/ VHDL/ SystemC)
    • Some scripting language (preferably Python)
  • Skills that will be acquired during project-work:
    • FPGA Design tools (Xilinx)
    • Extending RISC-V instructions
    • ASIC-based design
Resiliency (Reliability & Security)-aware System-level design for Neuromorphic Computing
  • Description: The project involves research into the design of resilient systems for Neuromorphic computing. Multiple aspects such as reliability and security of systems implementing Artificial Neural Networks and Spiking Neural Networks
  • Pre-requisites:
    • Machine Learning
    • SystemC /System Verilog
    • Digital Design
  • Skills that will be acquired during project-work:
    • Hands-on learning with HLS tools like Xilinx Vitis
    • System-level design and optimization
    • Resiliency design for critical systems
    • Technical writing for research publications
Using AI for Cyber-physical Systems
  • Description: The project involves the exploration of the applicability of various Machine Learning methods in the optimization of the controller design for cyber-physical systems. A sample problem of controlling various actuators in an office-building environment for minimizing energy consumption and maximizing user-comfort will be used as a test-case for testing the performance of traditional, predictive, and self-learning algorithms.
  • Pre-requisites:
    • Knowledge of AI/ML methods and some background in control systems.
    • Python with ML tools (Scikit/Tensorflow/Pytorch/OpenAI)
  • Skills that will be acquired during project-work:
    • Design of cyber-physical systems
    • Application of AI/ML methods for dynamic systems
    • Hardware design and impact of accelerators on cyber-physical systems' performance.
  • Related Publications:
    • Akhil Raj Baranwal, Salim Ullah, Siva Satyendra Sahoo, Akash Kumar, "ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Institute of Electrical and Electronics Engineers (IEEE), pp. 1–1, 28 October 2020.
Cross-layer Design of Heterogeneous Embedded Systems *

*Please note that this is planned as a short-term project as it involves extending current research results and may not be suitable as a complete thesis topic.

  • Description: The project involves the system-level design of embedded systems that considers design choices across multiple layers of the system stack. The project will focus on both traditional and machine learning applications as test applications for performance metrics such as latency, power, throughput, reliability, and energy.
  • Pre-requisites:
    • Python/C++
    • Optimization basics
  • Skills that will be acquired during project-work:
    • System-level modeling and analysis.
    • Multi-objective optimization across different system goals – performance, reliability, power dissipation, etc.
    • Technical writing for research publications.
  • Related Publications:
    • DSE for Cross-layer Low-power design in Heterogeneous Embedded Systems (Article Submitted for peer-review)
    • Siva Satyendra Sahoo, Bharadwaj Veeravalli, Akash Kumar, "Cross-layer fault-tolerant design of real-time systems" , In Proceeding: International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), pp. 1–6, Sept 2016
Storage-aware Hybrid Design Space exploration for Heterogeneous Embedded Systems *

*Please note that this is planned as a short-term project as it involves extending current research results and may not be suitable as a complete thesis topic.

  • Description: The project involves the system-level design of embedded systems that use a hybrid DSE approach for enabling storage-aware task-mapping in resource-constrained Heterogeneous Systems. The project will focus on both traditional and machine learning applications as test applications for performance metrics such as latency, power, throughput, reliability, and energy. The main focus will be to build on the published work cited below.
  • Pre-requisites:
    • Python/C++
    • Optimization basics
  • Skills that will be acquired during project-work:
    • System-level modeling and analysis.
    • Multi-objective optimization across different system goals – performance, reliability, power dissipation, etc.
    • Technical writing for research publications.
  • Related Publications:
    • S. S. Sahoo, B. Veeravalli, A. Kumar, "A Hybrid Agent-based Design Methodology for Dynamic Cross-layer Reliability in Heterogeneous Embedded Systems" , Proceedings of the 56th Annual Design Automation Conference, ACM, New York, NY, USA, June 2019.
Implementing Multi-objective Bayesian Optimization for System-level DSE *

*Please note that this is planned as a short-term project as it involves extending current research results and may not be suitable as a complete thesis topic.

  • Description: The project involves the system-level design of embedded systems and focuses on traversing a large design space through multi-objective Bayesian optimization The project will focus on both traditional and machine learning applications as test applications for performance metrics such as latency, power, throughput, reliability, and energy. The main focus will be to build on the published work cited below.
  • Pre-requisites:
    • Python/C++
    • Optimization basics
  • Skills that will be acquired during project-work:
    • System-level modeling and analysis.
    • Multi-objective optimization across different system goals – performance, reliability, power dissipation, etc.
    • Technical writing for research publications.
  • Related Publications:
    • Siva Satyendra Sahoo, Bharadwaj Veeravalli, Akash Kumar, "CL(R)Early: An Early-stage DSE Methodology for Cross-Layer Reliability-aware Heterogeneous Embedded Systems", Proceedings of the 57th Annual Design Automation Conference 2020, Association for Computing Machinery, New York, NY, USA, July 2020.
Probabilistic Modeling for Cross-layer Reliability in Heterogeneous Embedded systems *

*Please note that this is planned as a short-term project as it involves extending current research results and may not be suitable as a complete thesis topic.

  • Description: The project involves the probabilistic system-level modeling for improving reliability in heterogeneous Embedded Systems. The project will focus on both traditional and machine learning applications as test applications for performance metrics such as latency, power, throughput, reliability, and energy. The main focus will be to build on the published work cited below.
  • Pre-requisites:
    • Python/C++
    • Optimization basics
    • Probabilistic Modeling
  • Skills that will be acquired during project-work:
    • System-level modeling and analysis.
    • Multi-objective optimization across different system goals – performance, reliability, power dissipation, etc.
    • Technical writing for research publications.
  • Related Publications:
    • Siva Satyendra Sahoo, Bharadwaj Veeravalli, Akash Kumar, "Markov Chain-based Modeling and Analysis of Checkpointing with Rollback Recovery for Efficient DSE in Soft Real-time Systems", In Proceeding: 2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2020, ESA-ESRIN, Frascati (Rome) Italy, October 19-21, 2020, October 2020.
    • Siva Satyendra Sahoo, Bharadwaj Veeravalli, Akash Kumar, "CLRFrame: An Analysis Framework for Designing Cross-Layer Reliability in Embedded Systems", In Proceeding: 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID), pp. 1-6, Jan 2018.
An agent-based lifetime optimization-aware run-time adaptation approach to QoS-constrained Heterogeneous Embedded Systems *

*Please note that this is planned as a short-term project as it involves extending current research results and may not be suitable as a complete thesis topic.

  • Description: The project involves the system-level optimization for improving lifetime reliability in heterogeneous Embedded Systems. The project will focus on both traditional and machine learning applications as test applications and will involve improving the run-time optimization methodology. The main focus will be to build on the published work cited below.
  • Pre-requisites:
    • Python/C++
    • Optimization basics
  • Skills that will be acquired during project-work:
    • AI/ML for EDA
    • System-level modeling and analysis.
    • Multi-objective optimization across different system goals – performance, reliability, power dissipation, etc.
    • Technical writing for research publications.
  • Related Publications:
    • Siva Satyendra Sahoo, Akash Kumar, Bharadwaj Veeravalli, "Design and Evaluation of Reliability-oriented Task Re-Mapping in MPSoCs using Time-Series Analysis of Intermittent faults" , In Proceeding: Design, Automation and Test in Europe Conference and Exhibition (DATE), Mar 2016.
OpenCV Acceleration for PYNQ-based Ultra96 Platform
An example of a Computer Vision platform suggested by Xilinx (https://github.com/Xilinx/PYNQ-ComputerVision)

OpenCV is one of the most wildly used library in computer vision/image processing domain. However, in embedded systems, due to the resource and power constraints, they cannot offer the processing performance as high as the desktop-grade systems. Therefore, for image-processing-focused embedded systems (such as smart security cameras), dedicated image processing chips (beside GPU) have to be integrated to accelerate the operations while minimizing the power consumption. While this approach does offer many advantages (in terms of cost, performance and power), there are some limitations: (1) those chips only support a limited number of functionalities, (2) if more features are needed, new chips have to made to replace the old ones. In most (if not all) cases, the system has to be re-designed as the physical footprint of the new chips might be different.

This project works on accelerating OpenCV-based applications using FPGA. The platform on the FPGA does not target any specific application. It provides a mechanism to seamlessly accelerate standard OpenCV functions by allocating corresponding hardware functions to process the data instead of using the CPU. The accelerator-rich FPGA platform is partially reconfigurable with multiple slots to load the hardware functions at runtime. The accelerators are implemented based on the HLS-compatible OpenCV library provided by Xilinx.

Goals of this project and potential tasks

The project covers all levels of system design from hardware design, design automation, device driver, resource management with optimization in mapping and scheduling, to application analysis to make sure that the proposed framework is as transparent to the user as possible. Therefore, there are multiple phases needed to successfully deliver the project. For the current Master project/thesis, only the first two phases are presented here. The final goals of the project will be achieved by the follow-up projects.

Phase 1: 4 - 6 months

Experiment with a small number of OpenCV functions (provided by Xilinx in the xfOpenCV library) in different categories such as Image Arithmetic, Filters, Geometric Transform, Features extraction, etc.

Phase 2: 6 months

Analyze to generalize the hardware interface and the communication behavior required by different OpenCV functions in different categories with design automation tool. A preliminary resource management framework used to manage the partial bitstreams, FPGA resources as well as mapping and scheduling of the accelerators is expected in this phase.

Skills acquired in this project

  • Hands-on experiences with FPGA development with advanced topics such as Partial Reconfiguration
  • Hands-on experiences with designing embedded system with hardware/software co-design analysis for performance and energy efficiency
  • Advanced technical report writing

Pre-requisite

  • Digital design with VHDL/Verilog
  • Knowledge of computer architecture
  • C/C++, Python

Helpful Skills

  • Knowledge about image processing algorithms.
  • Knowledge about High-level Synthesis
  • Work independently

Contact Information

 

Any-chip (FPGA) Temperature Sensor using Ring-Oscillator
An example of the thermal hotspot mitigation results in our group

In chip design, thermal is one of the major issues which affect the correctness of the operations as well as the reliability and lifetime of the chip. Unfortunately, most of the FPGAs only have one or two temperature sensors in the middle of the chip. As the FPGAs are getting bigger which are capable of incorporating many accelerators or soft-core processors, these sensors cannot give much information of which component is the thermal hotspot on the chip. As a result, almost every thermal-aware mapping and scheduling algorithm proposed for such systems to mitigate the thermal hotspots is only evaluated in the simulation environment with the combination of power estimation and thermal simulation tools such as Hotspot.
Ring-oscillator is a viable solution. The ring-oscillators can be instantiated and placed on the FPGA as many as needed. With a proper design and calibration methodologies, they are very useful in providing a thermal map of the entire chip. In this case, the scheduling and mapping algorithm can be evaluated on the real platform with the real inputs from the operation of the chip. There are many research works that are tackling this issue. Nevertheless, there is no such design automation tool that automatically takes the design as input, insert the ring-oscillators at the appropriate locations on the FPGA to accurately capture the thermal map of the chip.


Goals of this project and Potential tasks

  • Literature survey on the existing techniques in having ring-oscillators on the FPGA
  • Implement ring-oscillators on the FPGA
  • Create a C/C++ Linux driver running on the ARM processor inside the FPGA to read out and build the thermal map of the chip
  • Analyze the original design as input to automatically insert and place the ring-oscillators on the FPGA

Skills acquired in this project

  • Hands-on experiences with FPGA development with advanced topics such as Partial Reconfiguration and automatic floorplanning
  • Design analysis with automation tool
  • Advanced technical report writing

Pre-requisite

  • Digital design with VHDL/Verilog
  • Knowledge of computer architecture
  • Knowledge about FPGA architecture
  • C/C++

Helpful Skills

  • Knowledge about TCL script to automate the design steps in Xilinx Vivado
  • Work independently

Contact Information

 

Coffee Machine Usage Automatic Logging Device with Person Detection/Recognition
An example of the coffee machine usage logging with the inefficient paper/pen and the smart device with the LCD for display and Camera + Microphone for Person Detection/Recognition through image or voice

In our lab, the usage of the coffee machine is done by using the very inefficient paper-and-pen method. When one particular user takes one coffee from the machine, he/she needs to look for his name on the paper sheet, and make a tick accordingly. At the end of a quarter, one person in charge must take the sheet, calculate the total cost for buying coffee beans/milk divided by the total number of cups taken by everybody. The contribution by everybody will be proportional to the number of cups he/she took.

In this project, we would like to have a smart device to take care of this tedious task. It can recognize that somebody is taking the coffee by doing person detection. After that, it asks for the permission to recognize the person through the face by using the camera or through the voice by using the microphone. If that person does not agree to do so, then ask him/her to speak his/her name. If the information associated to that person cannot be found, then the device asks for more information to store into the database. The person in charge of buying the coffee beans and milk needs to have a separate login portal to log the money spent and to issue the command to calculate the contribution from the users. All of these processing must be done locally on the device. The device communicates with the users through the touch LCD screen. In our labs, there are two coffee machines in two different rooms, and the users can use any of the coffee machine they want. Therefore, there will be two such devices in two rooms, one will act as a server to store all of the data to synchronize the usage of two coffee machines. These two boards will communicate through wifi.


Goals of this project and Potential tasks

The project covers different levels of system design in which the student has to write the user interface to interact with the users, implementing simple database backend framework for two boards to access to store/retrieve the users’ data and expenses, implementing person detection, face recognition and voice recognition algorithms using machine learning. These algorithms can be implemented on FPGA or with the external AI inference device (Intel Movidius stick, Google Coral, etc.) depending on the background of the student.


Phase 1: 3-4 months

Implement the core functions of the system: person detection, face recognition and voice recognition algorithms using machine learning


Phase 2: 2 months

Interface with the peripherals: camera + microphone + LCD touch screen. Implement the backend database for two board to have access to. Design the user interface.

Skills acquired in this project

  • Hands-on experiences with embedded system development and interfacing with peripherals such as monitor, camera and microphone and AI inference device.
  • Hands-on experiences with designing embedded system with hardware/software co-design analysis for performance and energy efficiency (when the external device is needed for AI inference)
  • Advanced technical report writing


Pre-requisite

  • C/C++, Python
  • FPGA development (if want to work with FPGA)


Helpful Skills

  • Knowledge about image processing algorithms.
  • Knowledge about machine learning
  • Knowledge about database, web design
  • Knowledge of computer architecture
  • Work independently


Contact Information

Apportion for Life: Operating Environment-aware Partitioning in FPGA-based Systems for Improving Lifetime Reliability

FPGAs (Field Programmable Gate Arrays) are being increasingly used across diverse application areas -- Healthcare, Military, Telecom, Automobiles, etc. Such diversity results in widely varying operating conditions for such FPGA-based embedded systems. Consequently, the rate of different type of physical faults witnessed by the system can differ by large margins. Further, in areas such as space exploration, repair by replacement can be near-impossible. Therefore, the mission life of systems can be one of the major design objectives. In addition, with the predicted growth in the number of IoT devices and the rising usage of FPGA-based edge devices, an increase in the system’s operational life can lead to lower electronic waste and more sustainable computing.

Project Goals:

  • Optimization across multiple system partitioning methods --- HW/HW, SW/SW and HW/SW --- for designing lifetime aware FPGA-based systems.
  • Model the impact of external and internal physical fault-causing mechanisms on a Dynamic Partially Reconfigurable (DPR)-based system.
  • Simulate/Emulate a functional DPR system on FPGAs with relevant fault-injection fault-diagnosis and repair mechanisms.

 

Skills Acquired:

  • Hands-on development with FPGA-based systems.
  • System-level modelling and design.
  • Multi-objective optimization across different system goals – performance, reliability, power dissipation etc.
  • Technical writing for research publications

Pre-requisites:

  • Knowledge of FPGA architecture.
  • Knowledge of real-time scheduling and related concepts
  • Programming skills: C/C++, Python

Additional Optional skills:

  • System-level design with VHDL/Verilog/SystemC
  • Accelerator design with Xilinx SDSoC, Vivado
  • Optimization tools and methods.


Contact Information:

 

Related articles:

  • Siva Satyendra Sahoo, Tuan Duy Anh Nguyen, B. Veeravalli, Akash Kumar, "Lifetime-aware Design Methodology for Dynamic Partially Reconfigurable Systems" , In Proceeding: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), vol. , no. , pp. 1-6, Jan 2018.
  • S.S. Sahoo, T.D.A. Nguyen, B. Veeravalli, A. Kumar, "Multi-objective design space exploration for system partitioning of FPGA-based Dynamic Partially Reconfigurable Systems" , In Integration, November 2018.
  • S. S. Sahoo, T. D. A. Nguyen, B. Veeravalli and A. Kumar, "QoS-Aware Cross-Layer Reliability-Integrated FPGA-Based Dynamic Partially Reconfigurable System Partitioning," 2018 International Conference on Field-Programmable Technology (FPT), Naha, Okinawa, Japan, 2018, pp. 230-233.
FPGA-based Artificial Neural Network Accelerator
CIFAR-10 Dataset for Imagen Classification

Goals of this project and Potential tasks

  • Literature survey on the existing FPGA-based hardware artificial neural network (ANN) accelerators
  • Implement a small Convolution Neural Network (CNN)with different design trade-offs on the FPGA
  • Providing the opportunity to use different approximate arithmetic units for energy efficiency

Skills acquired in this project

  • Hands-on experiences with FPGA-based development
  • Hands-on experience with ANNs
  • Advanced technical report writing

Pre-requisite

  • Digital design with VHDL/Verilog
  • Knowledge about ANNs
  • Knowledge about FPGA architecture
  • C/C++, Python

Helpful Skills

  • Knowledge about TCL script to automate the design steps in Xilinx Vivado
  • Work independently

Contact Information

Approximate Multiplier for FPGAs

Approximate multipliers have recently gained broad attention considering the ever-increasing application of error-resilient programs such as machine learning and multimedia as multiplication is the key operation in their computational core. Resource metrics such as performance, power and energy dissipation is of more importance in these applications as the output can tolerate a relaxed precision. Accelerators, such as FPGAs, are prime candidates for implementation of aforementioned multiplication-exhaustive programs. However, look-up table (LUT) based multipliers consume more area associated with higher latency than their Application Specific Integrated Circuit (ASIC) counterparts. In this project, we look for a implementing an area- and delay-efficient LUT-based multiplier for FPGAs. One interesting approach to implement multiplication is to translate it to addition using approximate algorithms such as Mitchell’s and provide area efficiency. In a furthur step, we also intend to exploit low-latency adders based on online arithmetic which significantly decrease delay.

Goals of this Thesis and Potential Tasks

  • Developing hardware implementation of approximate FPGA-specific adders and multipliers with different bit-width.
  • Developing functionally equivalent behavioral models using software languages like C++ or and testing in different benchmark applications.
  • Utilizing and assess approximate multiplier in real-world applications such as Deep Neural Networks or Digital Signal Processing.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

Contact information for more details

References

  • Saadat, Hassaan et al, "Minimally Biased Multipliers for Approximate Integers and Floating Point Multiplication." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, (2018).
  • Shi, Kan, and George A. Constantinides. "Evaluation of design trade-offs for adders in approximate datapath." HiPEAC Workshop on Approximate Computing, 2015.
  • Shi, Kan. "Design of approximate overclocked datapath" (2015).
Approximate Divider

Approximate dividers have recently gained extreme attention in top conferences for two reasons. First, although they are less frequent than multiplication, they are still inevitable operations in ever-increasing machine learning and multimedia applications. Second, their resource consumption and latency are multiple times of multipliers which made them the bottleneck of application (in energy and speed). However, our analysis show with approximation techniques, we can improve these metrics at least by 4x. In this project, we look for a implementing an area-, energy-, and delay-efficient divider for FPGA and ASIC platforms. The interesting point in our approximation approach is that it simplifies division to shift and subtraction. This divider can be used in neural network and improve resource metric while it has a negligible impact on the accuracy of classification. The project period is about 3-6 months and it is based on a recent accepted paper in ASP-DAC 2020 conference. As it would be an extention of a already published paper, we target to submit it ASAP to a journal.

Goals of this Thesis and Potential Tasks

  • Developing hardware implementation of approximate dividers with different bit-width.
  • Developing functionally equivalent behavioral models using software languages like C++ or and testing in different benchmark applications.
  • Utilizing and assess approximate divider in real-world applications such as Deep Neural Networks or Digital Signal Processing.

Pre-Requisites and helpful skills

  • FPGA development and programming (Verilog/VHDL, Vivado)
  • Software programming (Java/C++/Python/Matlab)

  
Contact information for more details

References

  • Ebrahimi, Zahra et al, "LeAp: Leading-one Detection-based Softcore Approximate Multipliers with Tunable Accuracy.", IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC), 2020.
  • Saadat, Hassaan et al, "Approximate Integer and Floating-Point Dividers with Near-Zero Error Bias." IEEE/ACM Design Automation Conference (DAC), 2019.
  • Saadat, Hassaan et al, "Minimally Biased Multipliers for Approximate Integers and Floating Point Multiplication." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2018.
Sacrificing flexibility in FPGAs: FPGA Architecture based on AND-Inverter Cones
A logic element based on And_inverter Cones

Look-up-Tables based FPGAs suffer when it comes to scaling, as their complexity increases exponentially with the increase in the number of inputs. Due to this reason, LUTs with more than 6 inputs have rarely been used. In an attempt to handle more inputs and increase the logical density of logic cells, {And-Inverter Cones (AICs)}, shown in figure below were proposed. These are an alternative for LUTs with a better compromise between hardware complexity, flexibility, delay and input and output counts. These are inspired by modern logic synthesis approaches which employ and-inverter graphs (AIGs) for representing logic networks. AIC is a binary tree which consists of AND nodes with programmable conditional inversion and offers tapping of intermediate results. AICs have a lot to offer as compared to the traditional LUT based FPGAs. The following points summarize the major benefits of using AICs over LUTs:

  1. For a given complexity, AICs can implement a function with more number of inputs compared to an LUT.
  2. Since it is inspired by AIGs, area and delay increase linearly and logarithmically respectively with the number of inputs which is in contrast to the respective exponential and linear increase in case of LUTs.
  3. Intermediate results can be tapped out of AICs thereby reducing logic duplication. 

While on one hand, we are sacrificing on the flexibility offered by FPGAs, there are certain new nanotechnologies based on materials like germanium and silicon which offers runtime-reconfigurability and functional symmetry between p and n-type behavior. The project aims to explore the FPGA architecture using these reconfigurable nanotechnologies in the context of AICs.

Skills acquired in this thesis:

  • Hands-on skills using Linux based systems
  • Programming in Python or C/C++
  • Working with tools like Cadence virtuoso environment and open source VTR (verilog-to-routing) tool for FPGAs
  • Problem analysis
  • Working in an international environment and communicating in English
  • Professional technical writing
  • Verilog/VHDL

Pre-Requisites:

  • Knowledge of FPGAs
  • Familiar with Linux environment, C or C++.

Contact Information:

Customizing Approximate Arithmetic Blocks for FPGA

 

Approximate Computing has emerged as a new paradigm for building highly power-efficient on-chip systems. The implicit assumption of most of the standard low-power techniques was based on precise computing, i.e., the underlying hardware provides accurate results. However, continuing to support precise computing is most likely not a way to solve upcoming power-efficiency challenges. Approximate Computing relaxes the bounds of precise computation, thereby providing new opportunities for power savings and may bear orders of magnitude in performance/power benefits. Recent research studies by several companies (like Intel, IBM, and Microsoft), and research groups have demonstrated that applying hardware approximations may provide 4x-60x energy reductions. These research studies have shown that there is a large body of power-hungry applications from several domains like image and video processing, computer vision, Big Data, and Recognition, Mining and Synthesis (RMS), which are amenable to approximate computing due to their inherent resilience to approximation errors and can still produce output of acceptable quality. State-of-the-art has developed approximate hardware designs to perform a computation approximation, with certain basic hardware blocks. For example, approximate designs only for the Ripple Carry Adders which has higher potential for the approximation, but ignoring the other types of widely used adders like: Kogge Stone adder, Carry look ahead, and Carry Sum, Carry Save adder.

 

 

Goals of this Thesis and Potential Tasks (Contact for more Discussion):

  • Developing an approximate FPGA-specific library for different arithmetic modules like adders, multipliers, dividers, and logical operations.
  • Developing complex multi-bit approximate functions and accelerators for FPGAs.
  • Interfacing custom instructions and FPGAs to soft cores (e.g. Microblaze) and using SDSoC Framework.
  • Developing functionally equivalent software models, e.g., using C or C++ and testing in different benchmark applications.
  • Open-sourcing and Documentation.

 

Skills acquired in this Thesis:

  • Hands-on experience on FPGA development and new SDSoC framework.
  • Computer Arithmetic and Hardware Optimization.
  • In-depth technical knowledge on the cutting-edge research topic and emerging computing paradigms.
  • Problem analysis and exploration of novel solutions for real-world problems.
  • Open-Sourcing.
  • Team work and experience in an international environment.
  • Professional grade technical writing.

 

Pre-Requisite (But not fully required!):

  • Knowledge of Computer architecture, C or C++ or MATLAB.
  • VHDL programming (beneficial if known and practiced in some labs)

 

Contact information:

Approximate Image Processing on FPGA

Currently Running

Image and video processing applications are well-known for their processing and power-hungry nature. Therefore, low-power implementation on such applications on resource-constrained devices poses several challenges. Approximate Computing has emerged as a new paradigm for building highly power-efficient on-chip systems. The implicit assumption of most of the standard low-power techniques was based on precise computing, i.e., the underlying hardware provides accurate results. However, continuing to support precise computing is most likely not a way to solve upcoming power-efficiency challenges. Approximate Computing relaxes the bounds of precise computation, thereby providing new opportunities for power savings and may bear orders of magnitude in performance/power benefits. Recent research studies by several companies (like Intel, IBM, and Microsoft), and research groups have demonstrated that applying hardware approximations may provide 4x-60x energy reductions. These research studies have shown that there is a large body of power-hungry applications from several domains like image and video processing, computer vision, Big Data, and Recognition, Mining and Synthesis (RMS), which are amenable to approximate computing due to their inherent resilience to approximation errors and can still produce output of acceptable quality.

 

Goals of this Thesis and Potential Tasks (Contact for more Discussion):

  • Developing image processing algorithms, like pre-processing and post-processing filters, edge detection, line detection, object detection, face recognition, and motion tracking.
  • Hardware development, testing, performance and power analysis for FPGA-based systems.
  • Research and Development of novel computation and data approximation techniques.

 

Skills acquired in this Thesis:

  • Hands-on experience on FPGA development.
  • In-depth technical knowledge on the cutting-edge research topic and emerging computing paradigms.
  • Problem analysis and exploration of novel solutions for real-world problems.
  • Team work and experience in an international environment.
  • Professional grade technical writing.

 

Pre-Requisite:

  • Knowledge of Computer architecture.
  • VHDL programming (beneficial if known and practiced in some labs)

 

Helpful Skills:

  • Knowledge about image processing algorithms

 

Contact information: