org.bdgenomics.adam.rdd.read

AlignmentRecordRDDFunctions

class AlignmentRecordRDDFunctions extends ADAMSequenceDictionaryRDDAggregator[AlignmentRecord]

Linear Supertypes
ADAMSequenceDictionaryRDDAggregator[AlignmentRecord], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. AlignmentRecordRDDFunctions
  2. ADAMSequenceDictionaryRDDAggregator
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AlignmentRecordRDDFunctions(rdd: RDD[AlignmentRecord])

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. def adamAlignedRecordSave(args: ADAMSaveAnyArgs): Boolean

  5. def adamBQSR(knownSnps: Broadcast[SnpTable], observationDumpFile: Option[String] = None, validationStringency: ValidationStringency = ValidationStringency.LENIENT): RDD[AlignmentRecord]

    Runs base quality score recalibration on a set of reads.

    Runs base quality score recalibration on a set of reads. Uses a table of known SNPs to mask true variation during the recalibration process.

    knownSnps

    A table of known SNPs to mask valid variants.

    observationDumpFile

    An optional local path to dump recalibration observations to.

    returns

    Returns an RDD of recalibrated reads.

  6. def adamCharacterizeTagValues(tag: String): Map[Any, Long]

    Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.

    Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.

    tag

    The name of the optional field whose values are to be counted.

    returns

    A Map whose keys are the values of the tag, and whose values are the number of time each tag-value occurs.

  7. def adamCharacterizeTags(): RDD[(String, Long)]

    Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.

    Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.

    returns

    An RDD of attribute name / count pairs.

  8. def adamConvertToSAM(isSorted: Boolean = false): (RDD[SAMRecordWritable], SAMFileHeader)

    Converts an RDD of ADAM read records into SAM records.

    Converts an RDD of ADAM read records into SAM records.

    returns

    Returns a SAM/BAM formatted RDD of reads, as well as the file header.

  9. def adamCountKmers(kmerLength: Int): RDD[(String, Long)]

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    kmerLength

    The value of _k_ to use for cutting _k_-mers.

    returns

    Returns an RDD containing k-mer/count pairs.

    See also

    adamCountQmers

  10. def adamFilterRecordsWithTag(tagName: String): RDD[AlignmentRecord]

    Returns the subset of the ADAMRecords which have an attribute with the given name.

    Returns the subset of the ADAMRecords which have an attribute with the given name.

    tagName

    The name of the attribute to filter on (should be length 2)

    returns

    An RDD[Read] containing the subset of records with a tag that matches the given name.

  11. def adamFlagStat(): (FlagStatMetrics, FlagStatMetrics)

  12. def adamGetReadGroupDictionary(): RecordGroupDictionary

    Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.

    Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.

    returns

    A dictionary describing the read groups in this RDD.

  13. def adamGetSequenceDictionary(performLexSort: Boolean = false): SequenceDictionary

    Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.

    Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.

    returns

    A sequence dictionary describing the reference contigs in this dataset.

    Definition Classes
    ADAMSequenceDictionaryRDDAggregator
  14. def adamMarkDuplicates(): RDD[AlignmentRecord]

  15. def adamRePairReads(secondPairRdd: RDD[AlignmentRecord], validationStringency: ValidationStringency = ValidationStringency.LENIENT): RDD[AlignmentRecord]

    Reassembles read pairs from two sets of unpaired reads.

    Reassembles read pairs from two sets of unpaired reads. The assumption is that the two sets were _originally_ paired together.

    secondPairRdd

    The rdd containing the second read from the pairs.

    validationStringency

    How stringently to validate the reads.

    returns

    Returns an RDD with the pair information recomputed.

    Note

    The RDD that this is called on should be the RDD with the first read from the pair.

  16. def adamRealignIndels(consensusModel: ConsensusGenerator = new ConsensusGeneratorFromReads, isSorted: Boolean = false, maxIndelSize: Int = 500, maxConsensusNumber: Int = 30, lodThreshold: Double = 5.0, maxTargetSize: Int = 3000): RDD[AlignmentRecord]

    Realigns indels using a concensus-based heuristic.

    Realigns indels using a concensus-based heuristic.

    isSorted

    If the input data is sorted, setting this parameter to true avoids a second sort.

    maxIndelSize

    The size of the largest indel to use for realignment.

    maxConsensusNumber

    The maximum number of consensus sequences to realign against per target region.

    lodThreshold

    Log-odds threhold to use when realigning; realignments are only finalized if the log-odds threshold is exceeded.

    maxTargetSize

    The maximum width of a single target region for realignment.

    returns

    Returns an RDD of mapped reads which have been realigned.

    See also

    RealignIndels

  17. def adamSAMSave(filePath: String, asSam: Boolean = true, asSingleFile: Boolean = false, isSorted: Boolean = false): AnyVal

    Saves an RDD of ADAM read data into the SAM/BAM format.

    Saves an RDD of ADAM read data into the SAM/BAM format.

    filePath

    Path to save files to.

    asSam

    Selects whether to save as SAM or BAM. The default value is true (save in SAM format).

    isSorted

    If the output is sorted, this will modify the header.

  18. def adamSAMString: String

  19. def adamSave(args: ADAMSaveAnyArgs, isSorted: Boolean = false): Boolean

  20. def adamSaveAsFastq(fileName: String, fileName2Opt: Option[String] = None, outputOriginalBaseQualities: Boolean = false, sort: Boolean = false, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves reads in FASTQ format.

    Saves reads in FASTQ format.

    fileName

    Path to save files at.

    outputOriginalBaseQualities

    Output the original base qualities (OQ) if available as opposed to those from BQSR

    sort

    Whether to sort the FASTQ files by read name or not. Defaults to false. Sorting the output will recover pair order, if desired.

  21. def adamSaveAsPairedFastq(fileName1: String, fileName2: String, outputOriginalBaseQualities: Boolean = false, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.

    Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.

    fileName1

    Path at which to save a FASTQ file containing the first mate of each pair.

    fileName2

    Path at which to save a FASTQ file containing the second mate of each pair.

    validationStringency

    Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.

  22. def adamSingleReadBuckets(): RDD[SingleReadBucket]

    Groups all reads by record group and read name

    Groups all reads by record group and read name

    returns

    SingleReadBuckets with primary, secondary and unmapped reads

  23. def adamSortReadsByReferencePosition(): RDD[AlignmentRecord]

  24. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  25. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  27. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  28. def filterByOverlappingRegion(query: ReferenceRegion): RDD[AlignmentRecord]

    Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion.

    Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion. Equality of the reference sequence (to which these are aligned) is tested by string equality of the names. AlignmentRecords whose 'getReadMapped' method return 'false' are ignored.

    The end of the record against the reference sequence is calculated from the cigar string using the ADAMContext.referenceLengthFromCigar method.

    query

    The query region, only records which overlap this region are returned.

    returns

    The subset of AlignmentRecords (corresponding to either primary or secondary alignments) that overlap the query region.

  29. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  30. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  31. def getSequenceRecordsFromElement(elem: AlignmentRecord): Set[SequenceRecord]

    For a single RDD element, returns 0+ sequence record elements.

    For a single RDD element, returns 0+ sequence record elements.

    elem

    Element from which to extract sequence records.

    returns

    A seq of sequence records.

    Definition Classes
    AlignmentRecordRDDFunctionsADAMSequenceDictionaryRDDAggregator
  32. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  33. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  34. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  35. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  36. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  37. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  38. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  39. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  40. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  41. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  42. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  43. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  44. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  45. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  46. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  47. def maybeSaveBam(args: ADAMSaveAnyArgs, isSorted: Boolean = false): Boolean

  48. def maybeSaveFastq(args: ADAMSaveAnyArgs): Boolean

  49. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  50. final def notify(): Unit

    Definition Classes
    AnyRef
  51. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  52. lazy val sc: SparkContext

  53. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  54. def toFragments: RDD[Fragment]

  55. def toString(): String

    Definition Classes
    AnyRef → Any
  56. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  58. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from ADAMSequenceDictionaryRDDAggregator[AlignmentRecord]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped