From e58072336a8caae541903e26c7ca64fbd343202a Mon Sep 17 00:00:00 2001 From: Menglan Xiang Date: Sat, 10 Apr 2021 14:13:06 -0700 Subject: [PATCH] Expanded negative word contractions in 3 md files --- _episodes/01-tidiness.md | 10 +++++----- _episodes/02-project-planning.md | 4 ++-- _episodes/03-ncbi-sra.md | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/_episodes/01-tidiness.md b/_episodes/01-tidiness.md index 7a7a9159..cf425bd7 100644 --- a/_episodes/01-tidiness.md +++ b/_episodes/01-tidiness.md @@ -45,7 +45,7 @@ Notes about your experiment, including how you prepared your samples for sequenc Including dates on your lab notebook pages, the samples themselves and in any records about those samples helps you associate everything with each other later. Using dates also helps create unique identifiers, because even -if you process the same sample twice, you don't usually do it on the same +if you process the same sample twice, you do not usually do it on the same day, or if you do, you're aware of it and give them names like A and B. > ## Unique identifiers @@ -66,7 +66,7 @@ consistent and can be used across the field. > > The Digital Curation Center maintains [a list of metadata standards](http://www.dcc.ac.uk/resources/metadata-standards/list) and some that are particularly relevant for genomics data are available from the [Genomics Standards Consortium](http://gensc.org/projects/). > -> If there aren't metadata standards already, you can think about what the minimum amount of information is that someone would need to know about your data to be able to work with it, without talking to you. +> If there are not metadata standards already, you can think about what the minimum amount of information is that someone would need to know about your data to be able to work with it, without talking to you. > {: .callout} @@ -76,11 +76,11 @@ Independent of the type of data you're collecting, there are standard ways to en The cardinal rules of using spreadsheet programs for data: -- Leave the raw data raw - don’t change it! +- Leave the raw data raw - do not change it! - Put each observation or sample in its own row. - Put all your variables in columns - the thing that vary between samples, like ‘strain’ or ‘DNA-concentration’. - Have column names be explanatory, but without spaces. Use '-', '_' or [camel case](https://en.wikipedia.org/wiki/Camel_case) instead of a space. For instance 'library-prep-method' or 'LibraryPrep'is better than 'library preparation method' or 'prep', because computers interpret spaces in particular ways. -- Don’t combine multiple pieces of information in one cell. Sometimes it just seems like one thing, but think if that’s the only way +- Do not combine multiple pieces of information in one cell. Sometimes it just seems like one thing, but think if that’s the only way you’ll want to be able to use or sort that data. For example, instead of having a column with species and strain name (e.g. *E. coli* K12) you would have one column with the species name (*E. coli*) and another with the strain name (K12). Depending on the type of analysis you want to do, you may even separate the genus and species names into distinct columns. @@ -104,7 +104,7 @@ analysis you want to do, you may even separate the genus and species names into Data organization at this point of your experiment will help facilitate your analysis later, as well as prepare your data and notes for data deposition now often required by journals and funding agencies. If this is a collaborative project, as most projects are now, it's also information that collaborators will need to interpret your data and results and is very useful for communication and efficiency. -Fear not! If you have already started your project, and it's not set up this way, there are still opportunities to make updates. One of the biggest challenges is tabular data that isn't formatted so computers can use it, or has inconsistencies that make it hard to analyze. +Fear not! If you have already started your project, and it's not set up this way, there are still opportunities to make updates. One of the biggest challenges is tabular data that is not formatted so computers can use it, or has inconsistencies that make it hard to analyze. More practice on how to structure data is outlined in our [Data Carpentry Ecology spreadsheet lesson](http://www.datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/) diff --git a/_episodes/02-project-planning.md b/_episodes/02-project-planning.md index b40e045a..0fc21377 100644 --- a/_episodes/02-project-planning.md +++ b/_episodes/02-project-planning.md @@ -110,13 +110,13 @@ The raw data you get back from the sequencing center is the foundation of your s - Store the data in a place that is accessible by you and other members of your lab. At a minimum, you and the head of your lab should have access. - Store the data in a place that is redundantly backed up. It should be backed up in two locations that are in different physical areas. -- Leave the raw data raw. You will be working with this data, but you don't want to modify this stored copy of the original data. If you modify the data, you'll never be able to access those original files. We will cover how to avoid accidentally changing files in a later lesson in this workshop [(see File Permissions)](https://datacarpentry.org/shell-genomics/03-working-with-files/#file-permissions). +- Leave the raw data raw. You will be working with this data, but you do not want to modify this stored copy of the original data. If you modify the data, you'll never be able to access those original files. We will cover how to avoid accidentally changing files in a later lesson in this workshop [(see File Permissions)](https://datacarpentry.org/shell-genomics/03-working-with-files/#file-permissions). #### Some data storage solutions If you have a local high performance computing center or data storage facility on your campus or with your organization, those are ideal locations. Get in touch with the people who support those facilities to ask for information. -If you don't have access to these resources, you can back up on hard drives. Have two backups, and keep the hard drives in different physical locations. +If you do not have access to these resources, you can back up on hard drives. Have two backups, and keep the hard drives in different physical locations. You can also use resources like [Amazon S3](https://aws.amazon.com/s3/), [Microsoft Azure](https://azure.microsoft.com/en-us/pricing/details/storage/blobs/), [Google Cloud](https://cloud.google.com/storage/) or others for cloud storage. The [open science framework](https://osf.io) is a free option for storing files up to 5 GB. See more in the lesson ["Introduction to Cloud Computing for Genomics"](http://www.datacarpentry.org/cloud-genomics/04-which-cloud/). diff --git a/_episodes/03-ncbi-sra.md b/_episodes/03-ncbi-sra.md index 2636e8af..d0ccfc23 100644 --- a/_episodes/03-ncbi-sra.md +++ b/_episodes/03-ncbi-sra.md @@ -111,7 +111,7 @@ Now you know that comma-separated and tab-separated files are both "text" files > {: .solution} {: .challenge} -After answering the questions, you should avoid saving any changes you might have made to this file. We don't want to make any changes. If you were to save this file, make sure you save it as a plain `.txt` file. +After answering the questions, you should avoid saving any changes you might have made to this file. We do not want to make any changes. If you were to save this file, make sure you save it as a plain `.txt` file. ## Downloading a few sequencing files: EMBL-EBI @@ -121,9 +121,9 @@ The SRA does not support direct download of fastq files from its webpage. Howeve 2. In the search bar, type in `SRR2589044`. Make sure there are no spaces after the accession number, and press search. -3. You will see a table with information about the sample. In the table, there is a header "FASTQ files (FTP)". If you wanted to download the files to your computer, you could click on the links to download the files. Alternatively, right click and copy the URL to save it for later. We don't need to download these files right now, and because they are large we won't put them on our computers now. +3. You will see a table with information about the sample. In the table, there is a header "FASTQ files (FTP)". If you wanted to download the files to your computer, you could click on the links to download the files. Alternatively, right click and copy the URL to save it for later. We do not need to download these files right now, and because they are large we will not put them on our computers now. -We don't recommend downloading large numbers of sequencing files this way. For that, the NCBI has made a software package called the `sra-toolkit`. However, for a couple files, it's often easier to go through the ENA. +We do not recommend downloading large numbers of sequencing files this way. For that, the NCBI has made a software package called the `sra-toolkit`. However, for a couple files, it's often easier to go through the ENA. ## Where to learn more