Di4

Incremental Inverted Index
for Genomics Big Data

Indexing Genomics Big Data

Next-Generation Sequencing (NGS), also known as high-throughput sequencing, has opened the possibility of a comprehensive characterization of the genomic and epigenomic landscapes, giving answers to fundamental questions for biological and clinical research. In this context, our research has focused on discovering how heterogeneous DNA regions concur to determine particular biological processes or phenotypes.

Di4, 1D intervals incremental inverted index, is a multi-resolution single-dimension data structure for interval-based data queries. It implements characteristic operations to be performed on region data regard identifying co-occurrences of regions, from different biological tests and/or of distinct semantic types, possibly within a certain distance from each others and/or from DNA regions with known structural or functional properties.

Img

AGILE BUILDING BLOCKS

Di4 defines region calculus which are efficiently implemented as building blocks, delivering a highly scalable retrieval framework.

Read Documentation

KEY FEATURES

Layered Design

makes Di4 adaptable to a wide variety of application scenarios and persistence technologies

Better Performance

than all the commonly used toolsets for Genomics interval-based data manipulation

Building blocks

at different layers of abstraction can be combined and extended provisioning a holistic framework

User-defined functions

can augment internal logic of building blocks enabling extensive compatibility