Indexing

Batch indexing intervals into Di4

Indexing First Resolution

Input intervals can be organized in Di4's first resolution using a batch index algorithm, which can be executed in single and double pass modes.


Set indexing mode

Use setim single or setim multi to set indexing mode to single and multi-pass modes respectively. One can also get the current setting for indexing mode using getim command. See the following console output as an example.

> setim multi
Indexing mode is set to <Multi-pass> indexing

> setim single
Indexing mode is set to <Single-pass> indexing

> getim
Indexing mode is set to <Single-pass> indexing

Batch index

Di4 can index data in batch; user specifies the files to be indexed, then Di4BCLI and Di4B parses the files one at a time, and indexes them in Di4.


Double-pass batch index: first pass

To run the first-pass of the double-pass indexing algorithm, use the following command:

batchindex *.narrowPeak

This command parses and indexes all the files with narrowpeaks extension in WorkingDirectory. Once the operation is finished, it reports the runtime and the number of indexed intervals as shown in the following example.

[44\45] wgEncodeAwgTfbsSydhK562Stat1Ifng30UniPk
 Loaded #i:      2,203     ET: 00:00:00.0172472     Speed:   127,731 #i\sec
Indexed #i:      2,203     ET: 00:00:00.0588744     Speed:    37,419 #i\sec
[45\45] wgEncodeAwgTfbsSydhK562Stat1Ifng6hUniPk
 Loaded #i:      2,333     ET: 00:00:00.0111938     Speed:   208,419 #i\sec
Indexed #i:      2,333     ET: 00:00:00.0554094     Speed:    42,105 #i\sec
           #indexed intervals: 456,385
                Load ET (sec): 2.9263167
               Index ET (sec): 8.8259436
              Commit ET (sec): 0.0255342
       Average indexing speed: 51523.98  #i\sec
-: Done  ...       Overall ET: 00:00:08.8577204


Double-pass batch index: second pass

Run 2pass command to execute the second pass of double-pass indexing algorithm. Note, this command is not required if indexing mode was set to single-pass indexing.

> 2pass
2ndPass #b:    746,190     ET: 00:00:01.0253235     Speed:   727,761 #b\sec
-: Done  ...       Overall ET: 00:00:01.0267024

Indexing Second Resolution

Run 2ri nuq 8 command to execute second-resolution indexing algorithm using pdf-optimized scalar quantization with 8 quantization levels. Other quantization methods are zt and uq for zero-thresholding and uniform quantization, where zero-thresholding does not take quantization level parameter.

> 2ri nuq 16
2RIndex #B:    470,235     ET: 00:00:01.9138708     Speed:   245,698 #B\sec
-: Done  ...       Overall ET: 00:00:01.9160924

Some other example of calling second-resolution indexing:

> 2ri zt

> 2ri uq 8

> 2ri nuq 16