Skip to content

Commit

Permalink
Merge pull request #136 from mxiang128/pr/expanding_word_contractions
Browse files Browse the repository at this point in the history
Expanded negative word contractions in 3 md files
  • Loading branch information
hoytpr authored May 20, 2021
2 parents 90b565e + 1781de2 commit 8096a8b
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 10 deletions.
10 changes: 5 additions & 5 deletions episodes/01-tidiness.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Notes about your experiment, including how you prepared your samples for sequenc
Including dates on your lab notebook pages, the samples themselves and in
any records about those samples helps you associate everything with each
other later. Using dates also helps create unique identifiers, because even
if you process the same sample twice, you don't usually do it on the same
if you process the same sample twice, you do not usually do it on the same
day, or if you do, you're aware of it and give them names like A and B.

> ## Unique identifiers
Expand All @@ -66,7 +66,7 @@ consistent and can be used across the field.
>
> The Digital Curation Center maintains [a list of metadata standards](http://www.dcc.ac.uk/resources/metadata-standards/list) and some that are particularly relevant for genomics data are available from the [Genomics Standards Consortium](http://gensc.org/projects/).
>
> If there aren't metadata standards already, you can think about what the minimum amount of information is that someone would need to know about your data to be able to work with it, without talking to you.
> If there are not metadata standards already, you can think about what the minimum amount of information is that someone would need to know about your data to be able to work with it, without talking to you.
>
{: .callout}

Expand All @@ -76,11 +76,11 @@ Independent of the type of data you're collecting, there are standard ways to en

The cardinal rules of using spreadsheet programs for data:

- Leave the raw data raw - don’t change it!
- Leave the raw data raw - do not change it!
- Put each observation or sample in its own row.
- Put all your variables in columns - the thing that vary between samples, like ‘strain’ or ‘DNA-concentration’.
- Have column names be explanatory, but without spaces. Use '-', '_' or [camel case](https://en.wikipedia.org/wiki/Camel_case) instead of a space. For instance 'library-prep-method' or 'LibraryPrep'is better than 'library preparation method' or 'prep', because computers interpret spaces in particular ways.
- Don’t combine multiple pieces of information in one cell. Sometimes it just seems like one thing, but think if that’s the only way
- Do not combine multiple pieces of information in one cell. Sometimes it just seems like one thing, but think if that’s the only way
you’ll want to be able to use or sort that data. For example, instead of having a column with species and strain name (e.g. *E. coli*
K12) you would have one column with the species name (*E. coli*) and another with the strain name (K12). Depending on the type of
analysis you want to do, you may even separate the genus and species names into distinct columns.
Expand All @@ -104,7 +104,7 @@ analysis you want to do, you may even separate the genus and species names into

Data organization at this point of your experiment will help facilitate your analysis later, as well as prepare your data and notes for data deposition now often required by journals and funding agencies. If this is a collaborative project, as most projects are now, it's also information that collaborators will need to interpret your data and results and is very useful for communication and efficiency.

Fear not! If you have already started your project, and it's not set up this way, there are still opportunities to make updates. One of the biggest challenges is tabular data that isn't formatted so computers can use it, or has inconsistencies that make it hard to analyze.
Fear not! If you have already started your project, and it's not set up this way, there are still opportunities to make updates. One of the biggest challenges is tabular data that is not formatted so computers can use it, or has inconsistencies that make it hard to analyze.

More practice on how to structure data is outlined in our [Data Carpentry Ecology spreadsheet lesson](http://www.datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/)

Expand Down
4 changes: 2 additions & 2 deletions episodes/02-project-planning.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,13 +110,13 @@ The raw data you get back from the sequencing center is the foundation of your s

- Store the data in a place that is accessible by you and other members of your lab. At a minimum, you and the head of your lab should have access.
- Store the data in a place that is redundantly backed up. It should be backed up in two locations that are in different physical areas.
- Leave the raw data raw. You will be working with this data, but you don't want to modify this stored copy of the original data. If you modify the data, you'll never be able to access those original files. We will cover how to avoid accidentally changing files in a later lesson in this workshop [(see File Permissions)](https://datacarpentry.org/shell-genomics/03-working-with-files/#file-permissions).
- Leave the raw data raw. You will be working with this data, but you do not want to modify this stored copy of the original data. If you modify the data, you'll never be able to access those original files. We will cover how to avoid accidentally changing files in a later lesson in this workshop [(see File Permissions)](https://datacarpentry.org/shell-genomics/03-working-with-files/#file-permissions).

#### Some data storage solutions

If you have a local high performance computing center or data storage facility on your campus or with your organization, those are ideal locations. Get in touch with the people who support those facilities to ask for information.

If you don't have access to these resources, you can back up on hard drives. Have two backups, and keep the hard drives in different physical locations.
If you do not have access to these resources, you can back up on hard drives. Have two backups, and keep the hard drives in different physical locations.

You can also use resources like [Amazon S3](https://aws.amazon.com/s3/), [Microsoft Azure](https://azure.microsoft.com/en-us/pricing/details/storage/blobs/), [Google Cloud](https://cloud.google.com/storage/) or others for cloud storage. The [open science framework](https://osf.io) is a free option for storing files up to 5 GB. See more in the lesson ["Introduction to Cloud Computing for Genomics"](http://www.datacarpentry.org/cloud-genomics/04-which-cloud/).

Expand Down
6 changes: 3 additions & 3 deletions episodes/03-ncbi-sra.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Now you know that comma-separated and tab-separated files are both "text" files
> {: .solution}
{: .challenge}

After answering the questions, you should avoid saving any changes you might have made to this file. We don't want to make any changes. If you were to save this file, make sure you save it as a plain `.txt` file.
After answering the questions, you should avoid saving any changes you might have made to this file. We do not want to make any changes. If you were to save this file, make sure you save it as a plain `.txt` file.

## Downloading a few sequencing files: EMBL-EBI

Expand All @@ -121,9 +121,9 @@ The SRA does not support direct download of fastq files from its webpage. Howeve

2. In the search bar, type in `SRR2589044`. Make sure there are no spaces after the accession number, and press search.

3. You will see a table with information about the sample. In the table, there is a header "FASTQ files (FTP)". If you wanted to download the files to your computer, you could click on the links to download the files. Alternatively, right click and copy the URL to save it for later. We don't need to download these files right now, and because they are large we won't put them on our computers now.
3. You will see a table with information about the sample. In the table, there is a header "FASTQ files (FTP)". If you wanted to download the files to your computer, you could click on the links to download the files. Alternatively, right click and copy the URL to save it for later. We do not need to download these files right now, and because they are large we will not put them on our computers now.

We don't recommend downloading large numbers of sequencing files this way. For that, the NCBI has made a software package called the `sra-toolkit`. However, for a couple files, it's often easier to go through the ENA.
We do not recommend downloading large numbers of sequencing files this way. For that, the NCBI has made a software package called the `sra-toolkit`. However, for a couple files, it's often easier to go through the ENA.

## Where to learn more

Expand Down

0 comments on commit 8096a8b

Please sign in to comment.