IDT Lab
At Innovative Data Technologies (IDT) Lab, we conduct research in all aspects of data management for science, including storage and I/O, file systems, metadata management, data quality assessment and improvement, performance analysis, performance tuning, data security, and energy-efficiency. Our emphasis is on developing systems and tools that make managing scientific data efficient and easy for scientists using high-performance computing (HPC), cloud, and edge computing systems.
Our research covers various aspects of data management
Data Management Systems for Science
Design and implementation of data management systems for scientific applications, including data-intensive workflows.
Efficient Parallel and Distributed I/O
Development of efficient parallel and distributed I/O systems, including data containers and storage hierarchies.
Data Readiness and Security
Research on data readiness for AI applications, ensuring data quality and compliance, and addressing security challenges in data management.
Featured projects
AI Data Readiness Inpector
AIDRIN is a framework designed accross centralized and decentralized (eg: federated learning) workflows to assess the readiness of data for AI applications, ensuring that datasets meet quality and compliance standards.
Drishti: I/O Insights for All
Dristhi is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications.
Proactive Data Containers
Formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; Efficient strategies for moving data in deep storage hierarchies using PDCs. Techniques for transforming and reorganizing data based on application requirements. Novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs
S2-D2: Securing Self-describing Data, Formats, and Libraries
This project will apply comprehensive testing, evaluation, issue identification, hardening, and validation to correct security deficiencies in self-describing file formats and libraries. The specific R&D tasks include: (1) assessing and fixing file format vulnerabilities, (2) protecting data access libraries, (3) exploring security solutions for metadata and data, and (4) constructing a security framework, called S2-D2.