Skip to main content

Projects

A Full-stack Approach to Declarative Analytics at Scale

Funded

The project investigates full-stack implementation methodologies for expressive programming systems that effectively bridge the gap between human-level specification and high-performance implementation of complex reasoning tasks at scale.

AI Data Readiness Inpector

IDT-ledFunded

AIDRIN is a framework designed accross centralized and decentralized (eg: federated learning) workflows to assess the readiness of data for AI applications, ensuring that datasets meet quality and compliance standards.

Drishti: I/O Insights for All

IDT-ledFundedOpen Source

Dristhi is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications.

Fastensor: Big Data Analytics on Arrays

IDT-ledFundedOpen Source

FasTensor, formerly known as ArrayUDF, is a generic parallel programming model for big data analyses with any user-defined functions (UDF). These functions may express data analysis operations from traditional database (DB) systems to advanced machine learning pipelines.

h5bench: a Parallel I/O Benchmark Suite for HDF5

IDT-ledFundedOpen Source

h5bench is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. H5bench measures I/O performance from various aspects, including the I/O overhead, observed I/O rate, etc.

Proactive Data Containers

IDT-ledFundedOpen Source

Formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; Efficient strategies for moving data in deep storage hierarchies using PDCs. Techniques for transforming and reorganizing data based on application requirements. Novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs

S2-D2: Securing Self-describing Data, Formats, and Libraries

IDT-ledFunded

This project will apply comprehensive testing, evaluation, issue identification, hardening, and validation to correct security deficiencies in self-describing file formats and libraries. The specific R&D tasks include: (1) assessing and fixing file format vulnerabilities, (2) protecting data access libraries, (3) exploring security solutions for metadata and data, and (4) constructing a security framework, called S2-D2.

StoreHub: A Community Infrastructure for Shaping the Future of Data Storage Research

IDT-ledFunded

This project focuses on planning activities associated with the realization of the StoreHub research infrastructure, which aims to support next-generation data storage research. The infrastructure is envisioned to provide a secure, flexible, and collaborative platform for researchers to design, test, and improve data storage technologies.