This file is indexed.

/usr/lib/R/site-library/IRanges/doc/IRangesOverview.Rnw is in r-bioc-iranges 2.4.1-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
%\VignetteIndexEntry{An Introduction to IRanges}
%\VignetteDepends{}
%\VignetteKeywords{Ranges}
%\VignettePackage{IRanges}
\documentclass[10pt]{article}

\usepackage{times}
\usepackage{hyperref}

\textwidth=6.5in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=-.1in
\evensidemargin=-.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\IRanges}{\Rpackage{IRanges}}

\title{An Introduction to \IRanges{}}
\author{Patrick Aboyoun, Michael Lawrence, Herv\'e Pag\`es}
\date{\today}

\begin{document}

\maketitle

<<options,echo=FALSE>>=
options(width=72)
@

\section{Introduction}

The \IRanges{} package is designed to represent sequences, ranges
representing indices along those sequences, and data related to those
ranges. In this vignette, we will rely on simple, illustrative example
datasets, rather than large, real-world data, so that each data structure
and algorithm can be explained in an intuitive, graphical manner. We
expect that packages that apply \IRanges{} to a particular problem domain
will provide vignettes with relevant, realistic examples.

The \IRanges{} package is available at bioconductor.org and can be
downloaded via \Rfunction{biocLite}:

<<biocLite, eval=FALSE>>=
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
@
<<initialize, results=hide>>=
library(IRanges)
@

\section{Vector objects}

In the context of the \IRanges{} package, a sequence is an ordered finite
collection of elements. The \IRanges{} packages represents two types of
objects as sequences: (1) atomic sequences and (2) lists (or non-atomic
sequences). The following subsections describe each in turn.
All \IRanges{}-derived sequences inherit from the \Rclass{Vector}
virtual class.

\subsection{Atomic Vectors}

In \R{}, atomic sequences are typically stored in atomic vectors. The
\IRanges{} package includes an additional atomic sequence object type,
\Rclass{Rle}, which compresses an atomic sequence through run-length
encoding.

We begin our discussion of atomic sequences using two \Rclass{Rle} vectors.
<<initialize>>=
set.seed(0)
lambda <- c(rep(0.001, 4500), seq(0.001, 10, length = 500), 
            seq(10, 0.001, length = 500))
xVector <- Rle(rpois(1e7, lambda))
yVector <- Rle(rpois(1e7, lambda[c(251:length(lambda), 1:250)]))
@

All atomic sequences in \R{} have three main properties: (1) a
notion of length or number of elements, (2) the ability to extract
elements to create new atomic sequences, and (3) the ability to be
combined with one or more atomic sequences to form larger
atomic sequences. The main functions for these three operations are
\Rfunction{length}, \Rfunction{[}, and \Rfunction{c}.
<<basic-ops>>=
length(xVector)
xVector[1]
zVector <- c(xVector, yVector)
@

While these three methods may seem trivial, they provide a great deal of
power and many atomic sequence manipulations can be constructed
using them.

\subsubsection{Vector Subsetting}

As with ordinary \R{} atomic vectors, it is often necessary to subset
one sequence from another. When this subsetting does not duplicate or
reorder the elements being extracted, the result is called a
\textit{subsequence}. In general, the \Rfunction{[} function can be used
to construct a new sequence or extract a subsequence, but its interface
is often inconvenient and not amenable to optimization. To compensate for
this, the \IRanges{} package supports seven additional functions for
sequence extraction:

\begin{enumerate}
\item \Rfunction{window} - Extracts a subsequence over a specified region.
\item \Rfunction{subset} - Extracts the subsequence specified by a logical
  vector.
\item \Rfunction{head} - Extracts a consecutive subsequence containing the
  first n elements.
\item \Rfunction{tail} - Extracts a consecutive subsequence containing the
  last n elements.
\item \Rfunction{rev} - Creates a new sequence with the elements in the
  reverse order.
\item \Rfunction{rep} - Creates a new sequence by repeating sequence
  elements.
\end{enumerate}

The following code illustrates how these functions are used on an
ordinary \R{} \Rclass{integer} vector:

<<seq-extraction>>=
xSnippet <- xVector[IRanges(4751, 4760)]
xSnippet
head(xSnippet)
tail(xSnippet)
rev(xSnippet)
rep(xSnippet, 2)
subset(xSnippet, xSnippet >= 5L)
@

\subsubsection{Combining Vectors}

The \IRanges{} package uses two generic functions, \Rfunction{c} and
\Rfunction{append}, for combining two \Rclass{Vector} objects. The methods
for \Rclass{Vector} objects follow the definition that these two functions
are given the the \Rpackage{base} package.

<<seq-combine>>=
c(xSnippet, rev(xSnippet))
append(xSnippet, xSnippet, after = 3)
@

\subsubsection{Looping over Vectors and Vector subsets}

In \R{}, \Rfunction{for} looping can be an expensive operation. To
compensate for this, \IRanges{} uses three generics, \Rfunction{endoapply},
\Rfunction{lapply}, and \Rfunction{sapply}, for looping over sequences and
two generics, \Rfunction{aggregate} and \Rfunction{shiftApply}, to perform
calculations over subsequences. The \Rfunction{lapply} and \Rfunction{sapply}
functions are familiar to many \R{} users since they are the standard
functions for looping over the elements of an \R{} \Rclass{list} object.
The \Rfunction{endoapply} function performs an endomorphism equivalent to
\Rfunction{lapply}, i.e. returns a \Rclass{Vector} object of the same class
as the input rather than a \Rclass{list} object. More will be given on these
three functions in the Lists subsection.

The \Rfunction{aggregate} function combines sequence extraction functionality
of the \Rfunction{window} function with looping capabilities of the
\Rfunction{sapply} function. For example, here is some code to compute medians
across a moving window of width 3 using the function \Rfunction{aggregate}:

<<aggregate>>=
xSnippet
aggregate(xSnippet, start = 1:8, width = 3, FUN = median)
@

The \Rfunction{shiftApply} function is a looping operation involving two
sequences whose elements are lined up via a positional shift operation.
For example, the elements of \Robject{xVector} and \Robject{yVector} were
simulated from Poisson distributions with the mean of element i from
\Robject{yVector} being equivalent to the mean of element i + 250 from
\Robject{xVector}. If we did not know the size of the shift, we could
estimate it by finding the shift that maximizes the correlation between
\Robject{xVector} and \Robject{yVector}.

<<shiftApply-cor>>=
cor(xVector, yVector)
shifts <- seq(235, 265, by=3)
corrs  <- shiftApply(shifts, yVector, xVector, FUN = cor)
@
%
<<figshiftcorrs, fig=TRUE, include=FALSE, eps=FALSE, width=5, height=5>>=
plot(shifts, corrs)
@

The result is shown in Fig.~\ref{figshiftcorrs}.
\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-figshiftcorrs}  
     \caption{\label{figshiftcorrs}%
      Correlation between \Robject{xVector} and \Robject{yVector} for various
      shifts.}
  \end{center}
\end{figure}


\subsubsection{Run Length Encoding}

Up until this point we have used \R{} atomic vectors to represent
atomic sequences, but there are times when these object become too
large to manage in memory. When there are lots of consecutive repeats in
the sequence, the data can be compressed and managed in memory through a
run-length encoding where a data value is paired with a run length. For
example, the sequence \{1, 1, 1, 2, 3,  3\} can be represented as
values = \{1, 2, 3\}, run lengths = \{3, 1, 2\}.

The \Rclass{Rle} class in \IRanges{} is used to represent a run-length
encoded (compressed) sequence of \Rclass{logical}, \Rclass{integer},
\Rclass{numeric}, \Rclass{complex},  \Rclass{character}, or \Rclass{raw}
values. One way to construct an \Rclass{Rle} object is through the
\Rclass{Rle} constructor function:

<<Rle-construction>>=
xRle <- Rle(xVector)
yRle <- Rle(yVector)
xRle
yRle
@

When there are lots of consecutive repeats, the memory savings through
an RLE can be quite dramatic. For example, the \Robject{xRle} object
occupies less than one quarter of the space of the original \Robject{xVector}
object, while storing the same information:

<<Rle-vector-compare>>=
as.vector(object.size(xRle) / object.size(xVector))
identical(as.vector(xRle), xVector)
@

The functions \Rfunction{runValue} and \Rfunction{runLength} extract
the run values and run lengths from an \Rclass{Rle} object respectively:

<<Rle-accessors>>=
head(runValue(xRle))
head(runLength(xRle))
@

The \Rclass{Rle} class supports many of the basic methods associated with
\R{} atomic vectors including the Ops, Math, Math2, Summary, and Complex
group generics. Here is a example of manipulating \Rclass{Rle} objects
using methods from the Ops group:

<<Rle-ops>>=
xRle > 0
xRle + yRle
xRle > 0 | yRle > 0
@

Here are some from the Summary group:

<<Rle-summary>>=
range(xRle)
sum(xRle > 0 | yRle > 0)
@

And here is one from the Math group:

<<Rle-math>>=
log1p(xRle)
@

As with the atomic vectors, the \Rfunction{cor} and \Rfunction{shiftApply}
functions operate on \Rclass{Rle} objects:

<<Rle-cor>>=
cor(xRle, yRle)
shiftApply(249:251, yRle, xRle, FUN = function(x, y) var(x, y) / (sd(x) * sd(y)))
@

For more information on the methods supported by the \Rclass{Rle} class,
consult the \Rcode{Rle} man page.

\subsection{Lists}

In many data analysis situation there is a desire to organize and
manipulate multiple objects simultaneously. Typically this is done in
\R{} through the usage of a list. While a list serves as a generic
container, it does not confer any information about the specific class
of its elements, provides no infrastructure to ensure type safety, and
the S3 and S4 method dispatch mechanisms do not support method selection
for lists with homogeneous object types.
The \Rclass{List} virtual class defined in the \IRanges{} package addresses
these issues. \Rclass{List} is a direct extension of \Rclass{Vector}.

\subsubsection{Lists of Atomic Vectors}

The first type of lists we consider are those containing atomic sequences
such as \Rclass{integer} vectors or \Rclass{Rle} objects. We may wish to
define a method that retrieves the length of each atomic sequence element,
without special type checking. To enable this, we define collection
classes such as \Rclass{IntegerList} and \Rclass{RleList}, which inherit
from the \Rclass{List} virtual class, for representing lists of
\Rclass{integer} vectors and \Rclass{Rle} objects respectively.

<<list-intro>>=
getClassDef("RleList")
@

As the class definition above shows, the \Rclass{RleList} class is virtual
with subclasses \Rclass{SimpleRleList} and \Rclass{CompressedRleList}. A
\Rclass{SimpleRleList} class uses a regular \R{} list to store the
underlying elements and the \Rclass{CompressedRleList} class stores the
elements in an unlisted form and keeps track of where the element breaks
are. The former ``simple list" class is useful when the Rle elements are
long and the latter ``compressed list" class is useful when the list is
long and/or sparse (i.e. a number of the list elements have length 0).

In fact, all of the atomic vector types (raw, logical, integer, numeric,
complex, and character) have similar list classes that derive from the
\Rclass{List} virtual class. For example, there is an
\Rclass{IntegerList} virtual class with subclasses
\Rclass{SimpleIntegerList} and \Rclass{CompressedIntegerList}.

Each of the list classes for atomic sequences, be they stored as vectors
or \Rclass{Rle} objects, have a constructor function with a name of the
appropriate list virtual class, such as \Rclass{IntegerList}, and an
optional argument \Rfunarg{compress} that takes an argument to specify
whether or not to create the simple list object type or the compressed
list object type. The default is to create the compressed list object
type.

<<list-construct>>=
args(IntegerList)
cIntList1 <- IntegerList(x = xVector, y = yVector)
cIntList1
sIntList2 <- IntegerList(x = xVector, y = yVector, compress = FALSE)
sIntList2
## sparse integer list
xExploded <- lapply(xVector[1:5000], function(x) seq_len(x))
cIntList2 <- IntegerList(xExploded)
sIntList2 <- IntegerList(xExploded, compress = FALSE)
object.size(cIntList2)
object.size(sIntList2)
@

The \Rfunction{length} function returns the number of elements in a
\Rclass{Vector}-derived object and, for a \Rclass{List}-derived object
like ``simple list" or ``compressed list", the \Rfunction{elementLengths}
function returns an integer vector containing the lengths of each of the
elements:

<<list-length>>=
length(cIntList2)
Rle(elementLengths(cIntList2))
@

Just as with ordinary \R{} \Rclass{list} objects, \Rclass{List}-derived
object support the \Rfunction{[[} for element extraction, \Rfunction{c}
for combining, and \Rfunction{lapply}/\Rfunction{sapply} for looping. When
looping over sparse lists, the ``compressed list" classes can be much faster
during computations since only the non-empty elements are looped over during
the \Rfunction{lapply}/\Rfunction{sapply} computation and all the empty
elements are assigned the appropriate value based on their status.

<<list-lapply>>=
system.time(sapply(xExploded, mean))
system.time(sapply(sIntList2, mean))
system.time(sapply(cIntList2, mean))
identical(sapply(xExploded, mean), sapply(sIntList2, mean))
identical(sapply(xExploded, mean), sapply(cIntList2, mean))
@

Unlist ordinary \R{} \Rclass{list} objects, \Rclass{AtomicList} objects support
the \Rfunction{Ops} (e.g. \Rfunction{+}, \Rfunction{==}, \Rfunction{\&}),
\Rfunction{Math} (e.g. \Rfunction{log}, \Rfunction{sqrt}),
\Rfunction{Math2} (e.g. \Rfunction{round}, \Rfunction{signif}),
\Rfunction{Summary} (e.g. \Rfunction{min}, \Rfunction{max}, \Rfunction{sum}),
and \Rfunction{Complex} (e.g. \Rfunction{Re}, \Rfunction{Im}) group generics.

<<list-groupgenerics>>=
xRleList <- RleList(xRle, 2L * rev(xRle))
yRleList <- RleList(yRle, 2L * rev(yRle))
xRleList > 0
xRleList + yRleList
sum(xRleList > 0 | yRleList > 0)
@

Since these atomic lists inherit from \Rclass{List}, they can also use
the looping function \Rfunction{endoapply} to perform endomorphisms.

<<list-endoapply>>=
safe.max <- function(x) { if(length(x)) max(x) else integer(0) }
endoapply(sIntList2, safe.max)
endoapply(cIntList2, safe.max)
endoapply(sIntList2, safe.max)[[1]]
@

\section{Data Tables}

To Do:
\Rclass{DataTable},
\Rclass{DataFrame},
\Rclass{DataFrameList},
\Rclass{SplitDataFrameList}

\section{Vector Annotations}

Often when one has a collection of objects, there is a need to attach
metadata that describes the collection in some way. 
Two kinds of metadata can be attached to a \Rclass{Vector} object:
\begin{enumerate}
\item Metadata about the object as a whole: this metadata is accessed
      via the \Rfunction{metadata} accessor and is represented as an
      ordinary \Rclass{list};
\item Metadata about the individual elements of the object: this metadata
      is accessed via the \Rfunction{mcols} accessor (\Rfunction{mcols}
      stands for {\it metadata columns}) and is represented as a
      \Rclass{DataTable} object (i.e. as an instance of a concrete
      subclass of \Rclass{DataTable}, e.g. a \Rclass{DataFrame} object).
      This \Rclass{DataTable} object can be thought of as the result of
      binding together one or several vector-like objects (the metadata
      columns) of the same length as the \Rclass{Vector} object. Each
      row of the \Rclass{DataTable} object annotates the corresponding
      element of the \Rclass{Vector} object.
\end{enumerate}

\section{Vector Ranges}

When analyzing sequences, we are often interested in particular
consecutive subsequences. For example, the \Sexpr{letters} vector could
be considered a sequence of lower-case letters, in alphabetical order.
We would call the first five letters (\textit{a} to \textit{e}) a
consecutive subsequence, while the subsequence containing only the
vowels would not be consecutive. It is not uncommon for an analysis
task to focus only on the geometry of the regions, while
ignoring the underlying sequence values. A list of indices would be a
simple way to select a subsequence. However, a sparser representation
for consecutive subsequences would be a range, a pairing of a start
position and a width, as used when extracting sequences with
\Rfunction{window}.

When analyzing subsequences in \IRanges{}, each range is treated as an
observation. The virtual \Rclass{Ranges} class represents lists of
ranges, or, equivalently and as a derivative \Rclass{IntegerList},
sequences of consecutive integers. The most commonly used
implementation of \Rclass{Ranges} is \Rclass{IRanges}, which stores
the starts and widths as ordinary integer vectors. To construct an
\Rclass{IRanges} instance, we call the \Rfunction{IRanges}
constructor. Ranges are normally specified by passing two out of the
three parameters: start, end and width (see \Rcode{help(IRanges)} for
more information).
%
<<iranges-constructor>>=
ir1 <- IRanges(start = 1:10, width = 10:1)
ir2 <- IRanges(start = 1:10, end = 11)
ir3 <- IRanges(end = 11, width = 10:1)
identical(ir1, ir2) & identical(ir2, ir3)
ir <- IRanges(c(1, 8, 14, 15, 19, 34, 40),
  width = c(12, 6, 6, 15, 6, 2, 7))
@
%
All of the above calls construct an \Rclass{IRanges} instance with the
same ranges, using different combinations of the \Rfunarg{start},
\Rfunarg{end} and \Rfunarg{width} parameters.

Accessing the starts, widths and ends is supported by every
\Rclass{Ranges} implementation.
<<iranges-start>>=
start(ir)
@
<<iranges-end>>=
end(ir)
@
<<iranges-width>>=
width(ir)
@

For \Rclass{IRanges} and some other \Rclass{Ranges} derivatives,
subsetting is also supported, by numeric and logical indices.
<<iranges-subset-numeric>>=
ir[1:4]
@
<<iranges-subset-logical>>=
ir[start(ir) <= 15]
@

One may think of each range as a sequence of integer ranges, and
\Rclass{Ranges} is, in fact, derived from \Rclass{IntegerList}.
<<ranges-extraction>>=
ir[[1]]
@

In order to illustrate range operations, we'll create a function to plot ranges.

<<plotRanges>>=
plotRanges <- function(x, xlim = x, main = deparse(substitute(x)),
                       col = "black", sep = 0.5, ...) 
{
  height <- 1
  if (is(xlim, "Ranges"))
    xlim <- c(min(start(xlim)), max(end(xlim)))
  bins <- disjointBins(IRanges(start(x), end(x) + 1))
  plot.new()
  plot.window(xlim, c(0, max(bins)*(height + sep)))
  ybottom <- bins * (sep + height) - height
  rect(start(x)-0.5, ybottom, end(x)+0.5, ybottom + height, col = col, ...)
  title(main)
  axis(1)
}
@

<<ir-plotRanges, fig=TRUE, include=FALSE, eps=FALSE, height=2.25>>=
plotRanges(ir)
@

\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-ir-plotRanges}  
     \caption{\label{fig-ir-plotRanges}%
      Plot of original ranges.}
  \end{center}
\end{figure}

\subsection{Normality}

Sometimes, it is necessary to formally represent a
subsequence, where no elements are repeated and order is
preserved. Also, it is occasionally useful to think of a
\Rclass{Ranges} object as a set, where no elements are repeated and
order does not matter. While every \Rclass{Ranges} object, as a
\Rclass{Vector} derivative, has an implicit ordering, one can
enforce the same ordering for all such objects, so that ordering
becomes inconsequential within that context. 

The \Rclass{NormalIRanges} class formally represents either a
subsequence encoding or a set of integers. By definition
a Ranges object is said to be \textit{normal} when its ranges are: (a)
not empty (i.e. they have a non-null width); (b) not overlapping; (c)
ordered from left to right; (d) not even adjacent (i.e. there must be
a non empty gap between 2 consecutive ranges).

There are three main advantages of using a \textit{normal}
\Rclass{Ranges} object: (1) it guarantees a subsequence
encoding or set of integers, (2) it is compact in terms of the number
of ranges, and (3) it uniquely identifies its information, which
simplifies comparisons.

The \Rfunction{reduce} function reduces any \Rclass{Ranges} object to
a \Rclass{NormalIRanges} by merging redundant ranges.
<<ranges-reduce, fig=TRUE, include=FALSE, eps=FALSE, height=2.25>>=
reduce(ir)
plotRanges(reduce(ir))
@

\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-ranges-reduce}  
     \caption{\label{fig-ranges-reduce}%
      Plot of reduced ranges.}
  \end{center}
\end{figure}

\subsection{Lists of \Rclass{Ranges} objects}

It is common to manipulate collections of \Rclass{Ranges} objects during
an analysis. Thus, the \IRanges{} package defines some specific classes
for working with multiple \Rclass{Ranges} objects.

The \Rclass{RangesList} class asserts that each element is a
\Rclass{Ranges} object and provides convenience methods, such as
\Rfunction{start}, \Rfunction{end} and \Rfunction{width} accessors
that return \Rclass{IntegerList} objects, aligning with the
\Rclass{RangesList} object.
To explicitly construct a \Rclass{RangesList}, use the
\Rfunction{RangesList} function.
<<rangeslist-contructor>>=
rl <- RangesList(ir, rev(ir))
@
%
<<rangeslist-start>>=
start(rl)
@

\subsection{Vector Extraction}

As the elements of a \Rclass{Ranges} object encode consecutive
subsequences, they may be used directly in sequence extraction. Note
that when a \textit{normal} \Rclass{Ranges} is given as the index, the
result is a subsequence, as no elements are repeated or reordered.
If the sequence is a \Rclass{Vector} subclass (i.e. not an ordinary
\Rclass{vector}), the canonical \Rfunction{[} function accepts a
\Rclass{Ranges} instance.
%
<<bracket-ranges>>=
irextract <- IRanges(start = c(4501, 4901) , width = 100)
xRle[irextract]
@
%

\subsection{Finding Overlapping Ranges}

The function \Rfunction{findOverlaps} detects overlaps between two
\Rclass{Ranges} objects.

<<overlap-ranges>>=
ol <- findOverlaps(ir, reduce(ir))
as.matrix(ol)
@

\subsection{Counting Overlapping Ranges}

The function \Rfunction{coverage} counts the number of ranges over each
position.

<<ranges-coverage, fig=TRUE, include=FALSE, eps=FALSE, height=2.25>>=
cov <- coverage(ir)
plotRanges(ir)
cov <- as.vector(cov)
mat <- cbind(seq_along(cov)-0.5, cov)
d <- diff(cov) != 0
mat <- rbind(cbind(mat[d,1]+1, mat[d,2]), mat)
mat <- mat[order(mat[,1]),]
lines(mat, col="red", lwd=4)
axis(2)
@

\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-ranges-coverage}  
     \caption{\label{fig-ranges-coverage}%
      Plot of ranges with accumulated coverage.}
  \end{center}
\end{figure}

\subsection{Finding Neighboring Ranges}

The \Rfunction{nearest} function finds the nearest neighbor ranges (overlapping
is zero distance). The \Rfunction{precede} and \Rfunction{follow} functions
find the non-overlapping nearest neighbors on a specific side.

\subsection{Transforming Ranges}

Utilities are available for transforming a \Rclass{Ranges} object in a
variety of ways. Some transformations, like \Rfunction{reduce}
introduced above, can be dramatic, while others are simple per-range
adjustments of the starts, ends or widths.

\subsubsection{Adjusting starts, ends and widths}

Perhaps the simplest transformation is to adjust the start values by a
scalar offset, as performed by the \Rfunction{shift} function. Below,
we shift all ranges forward 10 positions.
%
<<ranges-shift>>=
shift(ir, 10)
@

There are several other ways to transform ranges.  These include
\Rfunction{narrow}, \Rfunction{resize}, \Rfunction{flank},
\Rfunction{reflect}, \Rfunction{restrict}, and \Rfunction{threebands}.
For example \Rfunction{narrow} supports the adjustment of start, end
and width values, which should be relative to each range. These adjustments
are vectorized over the ranges. As its name suggests, the ranges can only be
narrowed.
% 
<<ranges-narrow>>=
narrow(ir, start=1:5, width=2)
@

The \Rfunction{restrict} function ensures every range falls within a
set of bounds. Ranges are contracted as necessary, and the ranges that
fall completely outside of but not adjacent to the bounds are dropped,
by default.
% 
<<ranges-restrict>>=
restrict(ir, start=2, end=3)
@
The \Rfunction{threebands} function extends \Rfunction{narrow} so that
the remaining left and right regions adjacent to the narrowed region
are also returned in separate \Rclass{Ranges} objects.
% 
<<ranges-threebands>>=
threebands(ir, start=1:5, width=2)
@

The arithmetic operators \Rfunction{+}, \Rfunction{-} and
\Rfunction{*} change both the start and the end/width by symmetrically
expanding or contracting each range.  Adding or subtracting a numeric
(integer) vector to a \Rclass{Ranges} causes each range to be expanded
or contracted on each side by the corresponding value in the numeric
vector.
<<ranges-plus>>=
ir + seq_len(length(ir))
@
%
The \Rfunction{*} operator symmetrically magnifies a \Rclass{Ranges}
object by a factor, where positive contracts (zooms in) and negative
expands (zooms out).
%
<<ranges-asterisk>>=
ir * -2 # double the width
@

WARNING: The semantic of these arithmetic operators might be revisited
at some point. Please restrict their use to the context of interactive
visualization (where they arguably provide some convenience) but avoid
to use them programmatically.

\subsubsection{Making ranges disjoint}

A more complex type of operation is making a set of ranges disjoint,
\textit{i.e.} non-overlapping. For example, \Rfunction{threebands}
returns a disjoint set of three ranges for each input range.

The \Rfunction{disjoin} function makes a \Rclass{Ranges} object
disjoint by fragmenting it into the widest ranges where the set of
overlapping ranges is the same.

<<ranges-disjoin, fig=TRUE, include=FALSE, eps=FALSE, height=2.25>>=
disjoin(ir)
plotRanges(disjoin(ir))
@

\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-ranges-disjoin}  
     \caption{\label{fig-ranges-disjoin}%
      Plot of disjoined ranges.}
  \end{center}
\end{figure}

A variant of \Rfunction{disjoin} is \Rfunction{disjointBins}, which
divides the ranges into bins, such that the ranges in each bin are
disjoint. The return value is an integer vector of the bins.
<<ranges-disjointBins>>=
disjointBins(ir)
@

\subsubsection{Other transformations}

Other transformations include \Rfunction{reflect} and
\Rfunction{flank}. The former ``flips'' each range within a set of common
reference bounds.
<<ranges-reflect>>=
reflect(ir, IRanges(start(ir), width=width(ir)*2))
@
%
The \Rfunction{flank} returns ranges of a specified width that flank,
to the left (default) or right, each input range. One use case of this
is forming promoter regions for a set of genes.
<<ranges-flank>>=
flank(ir, width = seq_len(length(ir)))
@
%

\subsection{Set Operations}

Sometimes, it is useful to consider a \Rclass{Ranges} object as a set
of integers, although there is always an implicit ordering. This is
formalized by \Rclass{NormalIRanges}, above, and we now present
versions of the traditional mathematical set operations
\textit{complement}, \textit{union}, \textit{intersect}, and
\textit{difference} for \Rclass{Ranges} objects. There are two
variants for each operation. The first treats each \Rclass{Ranges}
object as a set and returns a \textit{normal} value, while the other
has a ``parallel'' semantic like \Rfunction{pmin}/\Rfunction{pmax} and
performs the operation for each range pairing separately.

The \textit{complement} operation is implemented by the
\Rfunction{gaps} and \Rfunction{pgap} functions. By default,
\Rfunction{gaps} will return the ranges that fall between the ranges
in the (normalized) input. It is possible to specify a set of bounds,
so that flanking ranges are included.
<<ranges-gaps, fig=TRUE, include=FALSE, eps=FALSE, height=2.25>>=
gaps(ir, start=1, end=50)
plotRanges(gaps(ir, start=1, end=50), c(1,50))
@

\begin{figure}[tb]
  \begin{center}
     \includegraphics[width=0.5\textwidth]{IRangesOverview-ranges-gaps}  
     \caption{\label{fig-ranges-gap}%
      Plot of gaps from ranges.}
  \end{center}
\end{figure}


\Rfunction{pgap} considers each parallel pairing between two
\Rclass{Ranges} objects and finds the range, if any, between
them. Note that the function name is singular, suggesting that only
one range is returned per range in the input.
<<ranges-pgap>>=
@

The remaining operations, \textit{union}, \textit{intersect} and
\textit{difference} are implemented by the \Rfunction{[p]union},
\Rfunction{[p]intersect} and \Rfunction{[p]setdiff} functions,
respectively. These are relatively self-explanatory.
<<ranges-union>>=
@
<<ranges-punion>>=
@
<<ranges-intersect>>=
@
<<ranges-pintersect>>=
@
<<ranges-setdiff>>=
@
<<ranges-psetdiff>>=
@

% \subsection{Mapping Ranges Between Vectors}

% \Rfunction{mapCoords}

\section{Vector Views}

The \IRanges{} package provides the virtual \Rclass{Views} class, which
stores a sequence together with an \Rclass{IRanges} object defining ranges
on the sequence. Each range is said to represent a \textit{view} onto
the sequence.

Here, we will demonstrate the \Rclass{RleViews} class, where the
sequence is of class \Rclass{Rle}. Other \Rclass{Views}
implementations exist, such as \Rclass{XStringViews} in the
\Rpackage{Biostrings} package.

\subsection{Creating Views}

There are two basic constructors for creating views: the \Rfunction{Views}
function based on indicators and the \Rfunction{slice} based on numeric
boundaries.

<<Views-constructors>>=
xViews <- Views(xRle, xRle >= 1)
xViews <- slice(xRle, 1)
xViewsList <- slice(xRleList, 1)
@

\subsection{Aggregating Views}

While \Rfunction{sapply} can be used to loop over each window, the native
functions \Rfunction{viewMaxs}, \Rfunction{viewMins}, \Rfunction{viewSums},
and \Rfunction{viewMeans} provide fast looping to calculate their respective
statistical summaries.

<<views-looping>>=
head(viewSums(xViews))
viewSums(xViewsList)
head(viewMaxs(xViews))
viewMaxs(xViewsList)
@

\section{Data on Ranges}

IMPORTANT NOTE: Starting with BioC 2.12, the use of \Rclass{RangedData} and
\Rclass{RangedDataList} objects is discouraged in favor of \Rclass{GRanges}
and \Rclass{GRangesList} objects (those classes are defined in the
\Rpackage{GenomicRanges} package).

When analyzing ranges, there are often additional variables of
interest, besides the geometry (starts, ends and widths). To formally
represent a dataset where the ranges are the observations, \IRanges{}
defines the \Rclass{RangedData} class.

%\subsection{Manipulating \Rclass{RangedData}}

To create a \Rclass{RangedData} instance, one needs to provide a
\Rclass{Ranges} object and, optionally, any number of variables on
those ranges. The variable objects need not be vectors, but they must
satisfy the contract of \Rclass{DataFrame}.
<<RangedData-construct>>=
values <- rnorm(length(ir))
rd <- RangedData(ir, name = letters[seq_len(length(ir))], values)
rd
@

One might notice the term ``sequence'' in the above output. This
refers to an important feature of \Rclass{RangedData}: the ability to
segregate ranges by their sequence (or space). For example, when
analyzing genomic data, one is often working with ranges on different
chromosomes. In many cases, such as when calculating overlap, one
needs to separately treat ranges from different spaces, and
\Rclass{RangedData} aims to facilitate this mode of operation. The
segregation may be performed at construction time.
<<RangedData-construct-space>>=
rd <- RangedData(ir, name = letters[seq_len(length(ir))], values, 
                 space = rep(c("chr1", "chr2"), c(3, length(ir) - 3)))
rd
@

With the knowledge that the data is split into spaces, it should not
be surprising that the \Rfunction{ranges} accessor returns a
\Rclass{RangesList} and \Rfunction{values} returns a
\Rclass{SplitDataFrameList}.
<<RangedData-ranges>>=
ranges(rd)
@
<<RangedData-values>>=
values(rd)
@

To obtain a \Rclass{RangedData} for a specific set of spaces, one should
use the \Rfunction{[} function, which accepts logical, numeric and
character indices.
<<RangedData-subspace>>=
rd["chr1"]
@
%
<<RangedData-subspace-2>>=
all(identical(rd["chr1"], rd[1]),
    identical(rd[1], rd[c(TRUE, FALSE)]))
@
The \Rfunction{names} and \Rfunction{length} functions return the
names and number of spaces, respectively.
<<RangedData-names>>=
names(rd)
@
<<RangedData-length>>=
length(rd)
@

The \Rfunction{lapply} function operates over the spaces. The object
passed to the user function is a subset \Rclass{RangedData}.
<<RangedData-lapply>>=
lapply(rd, names)
@

The above would suggest that \Rclass{RangedData} is a sequence of
spaces. However, \Rclass{RangedData} also inherits from
\Rclass{DataTable}, so it in some ways behaves like a sequence of
columns. For example, one can extract a column via \Rfunction{\$} or
\Rfunction{[[}.
<<RangedData-extract>>=
rd[[2]]
@
<<RangedData-dollar>>=
rd$values
@
Note that the extracted columns are ``unlisted'' over the spaces,
which is usually much more convenient than obtaining them as lists. It
is important to note that the elements have been sorted by
the space factor and thus may not have the same order as the
objects passed to the constructor.

The two dimensional matrix-style subsetting is also supported. The
rows are indexed globally, independent of space.
<<RangedData-subset-2d>>=
rd[1:3, "name"]
@

\section{IRanges in Biological Sequence Analysis}

The \IRanges{} packages was primarily designed with biological sequence
analysis in mind and Table \ref{table:bioseq} shows how some biological
sequence analysis concepts are represented in the \IRanges{} class system.

\begin{table}[ht]
\begin{center}
\begin{tabular}{l|l}
\hline
Biological Entity                        & \Rclass{Vector} Subclass \\
\hline
Genome browser track(s)                  & \Rclass{GRanges}/\Rclass{GRangesList} \\
Coverage across chromosomes/contigs      & \Rclass{RleList} \\
Mapped ranges to genome                  & \Rclass{CompressedIRangesList} \\
Data (sans ranges) across chroms/contigs & \Rclass{SplitDataFrameList} \\
\hline
\end{tabular}
\end{center}
\caption{\Rclass{Vector} subclasses for Biological Sequence Analysis}
\label{table:bioseq}
\end{table}

\pagebreak[4]

\section{Session Information}

\begin{table*}[tbp]
\begin{minipage}{\textwidth}
<<sessionInfo, results=tex, print=TRUE>>=
toLatex(sessionInfo())
@
\end{minipage}
\caption{\label{tab:sessioninfo}%
The output of \Rfunction{sessionInfo} on the build system 
after running this vignette.}
\end{table*}

\end{document}