Skip to main content

Projects

Proactive Data Containers

IDT-ledFundedOpen Source

Formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; Efficient strategies for moving data in deep storage hierarchies using PDCs. Techniques for transforming and reorganizing data based on application requirements. Novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs

AI Data Readiness Inspector

IDT-ledFundedOpen Source

AIDRIN is a framework designed accross centralized and decentralized (eg: federated learning) workflows to assess the readiness of data for AI applications, ensuring that datasets meet quality and compliance standards.

SAGEST Center

Funded

Developing a trustworthy predictive system that utilizes embedded near-ab-initio simulations to make predictions with quantified uncertainty in extreme regimes where physical experimental validation is unavailable.

h5bench: a Parallel I/O Benchmark Suite for HDF5

IDT-ledFundedOpen Source

h5bench is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. H5bench measures I/O performance from various aspects, including the I/O overhead, observed I/O rate, etc.

S2-D2: Securing Self-describing Data, Formats, and Libraries

IDT-ledFunded

This project will apply comprehensive testing, evaluation, issue identification, hardening, and validation to correct security deficiencies in self-describing file formats and libraries. The specific R&D tasks include: (1) assessing and fixing file format vulnerabilities, (2) protecting data access libraries, (3) exploring security solutions for metadata and data, and (4) constructing a security framework, called S2-D2.

A Full-stack Approach to Declarative Analytics at Scale

Funded

The project investigates full-stack implementation methodologies for expressive programming systems that effectively bridge the gap between human-level specification and high-performance implementation of complex reasoning tasks at scale.

StoreHub: A Community Infrastructure for Shaping the Future of Data Storage Research

Funded

This project focuses on planning activities associated with the realization of the StoreHub research infrastructure, which aims to support next-generation data storage research. The infrastructure is envisioned to provide a secure, flexible, and collaborative platform for researchers to design, test, and improve data storage technologies.

Drishti: I/O Insights for All

IDT-ledFundedOpen Source

Dristhi is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications.

Fastensor: Big Data Analytics on Arrays

IDT-ledFundedOpen Source

FasTensor, formerly known as ArrayUDF, is a generic parallel programming model for big data analyses with any user-defined functions (UDF). These functions may express data analysis operations from traditional database (DB) systems to advanced machine learning pipelines.