Projects

Formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; Efficient strategies for moving data in deep storage hierarchies using PDCs. Techniques for transforming and reorganizing data based on application requirements. Novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs

AIDRIN is a framework designed accross centralized and decentralized (eg: federated learning) workflows to assess the readiness of data for AI applications, ensuring that datasets meet quality and compliance standards.

Developing a trustworthy predictive system that utilizes embedded near-ab-initio simulations to make predictions with quantified uncertainty in extreme regimes where physical experimental validation is unavailable.

h5bench is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. H5bench measures I/O performance from various aspects, including the I/O overhead, observed I/O rate, etc.

This project will apply comprehensive testing, evaluation, issue identification, hardening, and validation to correct security deficiencies in self-describing file formats and libraries. The specific R&D tasks include: (1) assessing and fixing file format vulnerabilities, (2) protecting data access libraries, (3) exploring security solutions for metadata and data, and (4) constructing a security framework, called S2-D2.

The project investigates full-stack implementation methodologies for expressive programming systems that effectively bridge the gap between human-level specification and high-performance implementation of complex reasoning tasks at scale.

This project focuses on planning activities associated with the realization of the StoreHub research infrastructure, which aims to support next-generation data storage research. The infrastructure is envisioned to provide a secure, flexible, and collaborative platform for researchers to design, test, and improve data storage technologies.

Dristhi is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications.

FasTensor, formerly known as ArrayUDF, is a generic parallel programming model for big data analyses with any user-defined functions (UDF). These functions may express data analysis operations from traditional database (DB) systems to advanced machine learning pipelines.

Projects

Proactive Data Containers

AI Data Readiness Inspector

SAGEST Center

h5bench: a Parallel I/O Benchmark Suite for HDF5

S2-D2: Securing Self-describing Data, Formats, and Libraries

A Full-stack Approach to Declarative Analytics at Scale

StoreHub: A Community Infrastructure for Shaping the Future of Data Storage Research

Drishti: I/O Insights for All

Fastensor: Big Data Analytics on Arrays