# sra¶

dms_tools2.sra.fastqFromSRA(samples, fastq_dump, fastqdir, aspera=None, overwrite=False, passonly=True, no_downloads=False, ncpus=1)[source]

Currently only works for runs containing paired-end reads.

Args:
samples (pandas.DataFrame)

A dataframe that must have columns named run and name. The run column gives SRA run accessions (e.g., SRR5241726), and the name column gives the name for the run used in the final FASTQ files. Will be modified to include R1 and R2 columns.

fastq_dump (str)

Path to fastq-dump executable. Requires a version >= 2.8.

fastqdir (str)

Directory in which to place the FASTQ files. Created if it does not already exist.

aspera (None or 2-tuple)

If None, use fastq-dump for downloads (this is slower). However, downloads are faster with aspera To use aspera, specify the 2-tuple (ascp, asperakey) where ascp is path to ascp executable, and asperakey is the key.

overwrite (bool)

If file already exists, do we overwrite it or just use the existing one? If False and all output files already exist, then nothing is done and fastq_dump no longer even needs to be a valid path.

passonly (bool)

Upon completion, the directory fastqdir contains files of the form <name>_R1.fastq.gz and <name>_R2.fastq.gz for all names in samples. These names have been added as the columns R1 and R2 to samples. Note that the file names but not the directory names are added.