Information Generation - Design of Sample Collection, Sample Analysis, and Data Interpretation Methodologies
Why You Should Read this Chapter: Most citizen science projects that you join or initiate will require generating information that was previously uncollected, unknown, unreported, or unestablished in the realm of public knowledge. Because most projects will involve this type of "information generation," it is important, and often critical, to your long-term success to think about how you will perform: (i) sample collection (i.e., how will you gather samples of air, water, soil, etc.?); (ii) sample analysis (i.e., how will you examine the samples you collect?); and (iii) data interpretation (i.e., how will you interpret the results of your sample analyses?).
The focus of this chapter is to help you generate high quality information. For some, this may seem like a daunting process. We emphasize that even if it is currently too difficult or expensive for you to comply with the most stringent state or federal quality assurance requirements, any information that you generate can have some use (discussed in Chapter 2). Indeed, in some instances this information could – and perhaps should – still suffice to trigger agency action. In this way, you can play the critical role of alerting the agency to potential environmental problems and enabling the agency to follow-up by utilizing appropriate information collection protocols. Nonetheless, understanding how the design and performance of your project impacts information quality will help assure that your project ultimately meets your goals.
As discussed previously, the use of citizen scientist-generated information can be limited by the information’s quality (discussed in Chapter 2). At one extreme, state and federal agency regulations require that only high quality information be used to form the underpinnings of their actions (see Appendices 1 and 2). For example, the Minnesota Pollution Control Agency requires that citizen monitoring data meet the credibility requirements established in its “Volunteer Surface Monitoring Guide” when implementing the state clean water act. Likewise, many federal regulations include specific requirements to assure information quality. Although these requirements vary in different contexts, EPA-funded programs generally require the preparation of an EPA-approved Quality Assurance Protection Plan (QAPP) before people begin collecting samples.
Ultimately, high quality information has the highest utility or usefulness. Therefore, this discussion explains several technical suggestions that can increase the quality of the information you generate. In particular, we distill general suggestions that the EPA has established to promote information credibility and provide you with supplemental resources for additional information. We draw upon public EPA documents including “The Volunteer Monitor’s Guide to Quality Assurance Project Plans,” “The Citizen Science QAPP Guidance,” and “Guidance for Choosing a Sampling Design for Environmental Data Collection.” Other resources, such as the Federal Crowdsourcing and Citizen Science Toolkit, are available to aid citizen scientists in the design of sample collection, sample analysis, and data interpretation methodologies.
Assessing Information Quality
Indicators of Quality Data
When you present information that you have collected or generated (e.g., a summary of your tests of the water quality in a stream) to a decision maker, he or she must assess the quality of the information without having a chance to perform his or her own data collection or testing. Instead, decision makers often look for “indicators” of high quality data. Examples include: precision, accuracy, representativeness, completeness, comparability and instrumentation. Therefore, by considering these elements as you design and conduct your project, you will increase both your confidence in the information that the project generates and the ability of a decision maker to consider and rely on your findings. The indicators of quality data are each discussed below.
Precision relates to the degree of agreement (i.e., similarity) between (i) multiple measurements taken from a single sample or (ii) measurements taken from multiple samples collected as close together in time and place as possible. Collecting multiple independent samples from a single site at roughly the same time in the same manner (i.e., “replica” samples) and analyzing the samples at the same time and in the same manner, allows for robust statistical calculations of precision (e.g., calculation of standard deviation, standard error, or relative percent difference). A high level of precision suggests that your sampling and testing methods are consistent and can be reproduced; this is an indication of high quality information.
Accuracy ensures that your data represents reality. You can facilitate the measurement of accuracy by collecting quality control samples that have known values. Examples of various quality control samples are discussed in greater detail in the next section of this chapter. Quality control samples should be collected along with, and in ways that mimic your collection of field samples, and they should be analyzed using the same instrumentation. When the values reported from the control samples consistently and precisely reflect their known values, it suggests that the accuracy of your field samples is high; this is an indication of high quality information.
Representativeness relates to whether a sample collected from a site is actually representative of that site. Here, the central concern is to avoid biases in the generated information. How, when, where, and by whom samples are collected will influence the representativeness of your information. For example, if you are collecting samples to determine the typical concentration of a pollutant in a stream, the following factors could bias your results:
How: the samples were collected with unclean tools. This creates a risk of bias because any pollutant detected in the analysis of the samples may have actually arisen from the unclean tools.
When: the samples were collected just after heavy rainfalls. This may create a risk of bias because various pollutants that are not normally in the river might be washed there from various sources due to the rain. Note: this risk of bias would not be present if rain is typical of the location studied or, alternatively, if you were interested in determining the concentration of a pollutant in a stream following heavy rainfalls.
Where: the samples were collected just below a pipe outfall that is entering the stream. This creates a risk of bias because the concentration of pollutant just below the pipe will be higher than the concentration of pollutant in the stream generally. Note: this risk of bias would not be present if you were interested in determining the concentration of a pollutant just below the pipe or, alternatively, if you were interested in determining the abundance of pollution entering the stream from the pipe.
By Whom: samples were collected by a person untrained in proper sampling technique. This creates a risk of bias because it will be less certain that the samples were collected properly (i.e., in a way that is representative).
As demonstrated in these examples, what constitutes a bias that impacts representativeness may be different in each situation.
Completeness involves a comparison of the number of measurements you originally planned to collect (i.e., the number that you anticipated would be necessary for the information to be useful) and the number that you actually collected. Collecting more samples than you think will be necessary can help assure information completeness; this is an indication of high quality information.
Comparability refers to the relationship between results of multiple studies or a single study over time. Multiple studies that report similar conclusions suggests that data quality is high. Moreover, information reported from a single study that presents realistic results over time (e.g., consistent, gradual changes, or explainable rapid changes) is of higher quality than information reported from a single study that presents sporadic, unexplained fluctuations in values.
Information Quality Needs Can Change Over Time: Your anticipated use of the information can change over the lifetime of your project, causing its information quality requirements to increase or decrease (see Chapter 2). Your purpose for collecting data can change over time. For example, your project might originally be directed at monitoring a currently unthreatened natural resource to facilitate a rapid response to any potential increases in pollution. The information quality that you seek may change if a pollution increase is detected. Likewise, you might perform a general preliminary site survey to verify the identity of a potential pollutant or pollutant source before performing a detailed site evaluation. A preliminary site evaluation can include documentation of evidence of: the scent of air at the site of interest; oil slicks on the surface of water; stained soil or pavement; stressed vegetation on land or in water; solid waste (e.g., mounds or depressions suggesting solid waste disposal); wastewater entering a stream; or unmaintained septic systems. In some instances, you might collect and analyze a few field samples from the site to identify pollutants on the site. Perhaps, in this instance, the information quality that you seek will increase after the pollutant or pollutant site has been verified. Ultimately, information generation, is, in many instances, an iterative process, so the type of information that you seek to generate can change over time.
Instrumentation used to analyze the samples you collect can also impact the quality of the generated information. Each analytical instrument has a range of values, such as the amount of a pollutant in a sample, which it can detect in a reliable manner. If the presence of a pollutant in a sample (sometimes referred to as an analyte abundance) is below the instrument’s lowest detection limit (i.e., limit of blank, limit of detection, or limit of quantitation) the pollutant’s presence will be reported with a value of zero, or less than zero. If the presence of a pollutant in a sample is greater than the instrument’s highest quantifiable limit, the pollutant’s presence will be reported with a value that is no greater than the instrument’s maximum reportable value. As readings approach these detection limits, they become less reliable. In short, if reported values fall within an instrument’s measurement range, it suggests that the values are reliable, which is an indication of high quality information.
Quality Asssurance Protection Plan Guidelines
Prepare or review a project’s QAPP before collecting samples or information. Put your QAPP into a written format that can be shared with volunteers and decision makers.
A Quality Assurance Protection Plan (QAPP) is a formal document that describes how a project will achieve its information quality requirements. In other words, a QAPP lists the quality assurance mechanisms that will be used to assure that the information generated by the project meets the quality criteria discussed above. Importantly, this document is prepared prior to any sample collection. Ultimately, the QAPP is a project feature that decision makers will use to assess the overall quality of the generated information. Preparing a QAPP is part of a project’s quality assurance (QA) activities. (Another term you may see is quality control (QC), which refers to the overall system of technical activities that are designed to measure the quality of information.)
Key elements of QAPPs
- Management description
- Sampling design
- Sampling collection
- Sample handling & custody
- Sample analysis
- Quality controls
- Data interpretation
Although the EPA lists twenty-four distinct issues that can be addressed in a QAPP, we focus here on various themes that we deem especially important and useful in the context of citizen science projects: (i) management description, (ii) sampling design, (iii) sample collection methodology, (iv) sample handling and custody, (v) sample analysis, (vi) quality controls, and (vii) data interpretation. We stress that the nature or type of pollutant and the pollutant source heavily dictate the content of the QAPP. The EPA has issued a vast number of very specific and detailed protocols for the measurement of pollutants in various contexts (i.e., “EPA Reference Methods” or “EPA Standard Protocols”). A collection of these methods and protocols can be found on EPA’s website. They delineate detailed descriptions of accepted sampling methodologies, quality controls, instrumentation functionalities, etc. Including this level of detail here is impractical. Instead, we offer broad, generalizable suggestions and provide additional resources for those who seek greater detail for their individual project needs.
While some projects are small enough that a single person can successfully complete them, many will require the coordinated efforts of many individuals. Indeed, the most successful projects may involve a “community” of individuals. When projects involve groups of individuals, establishing and describing management roles at the onset of the project is important for ensuring project consistency and cohesiveness.
Project managers must (among many other things): identify funding resources and control expenditures of funds; establish what, when, how, and by whom samples will be collected, analyzed, and interpreted; ensure that volunteers understand how to clean and calibrate instrumentation; and assure, if needed, the proper training of those involved in the project (e.g., in proper sample collection) and otherwise ensure information quality.
Project managers should also seek to maximize the use of community expertise. For example, even if you lack the training or expertise to design or complete a project, your community may include individuals with technical or scientific training who are willing and eager to participate (e.g., teachers or professors, scientists and engineers, or even members of environmental agencies).
Sampling design includes considering the types of samples that will be collected and when and where they will be collected. Sampling design decisions implicate multiple factors that impact information quality, but it is primarily concerned with the representativeness of the information. A well-developed sampling design plays a central role in ensuring that conclusions are adequately supported by data. Thinking about your sampling design at the beginning of a project can help avoid introducing bias at the onset of information generation. Avoiding bias is important; as the saying goes, “Garbage in, is garbage out."
In some aspects, your sampling design will be dependent on the type of sample you are collecting. For example, the placement of air monitors depends on the sampling objective: ground level monitoring, air mass (i.e., circulating air), or source-oriented (e.g., as the air exist a smoke stack), and it is important for air flow around the monitor to be representative of the general air flow in the area to prevent sampling bias. Likewise, water and soil sampling designs can include details concerning the location and depth at which samples will be collected. When contemplating the types of samples that will be collected, you should consider the chemical/physical properties of the pollutant and the potential source of the pollutant (discussed in Chapter 3)
The sampling design should include documentation of when and where samples will be collected, including, for example, the following types of information:
- The number of times that a sample will be collected per week, month or year;
- The duration of the sampling program (e.g., the period of time during which samples will be collected);
- At what time of the day or night the samples will be taken (e.g., during or after an industrial facility’s hours of operation);
- How weather will impact sample collection (e.g., will samples be collected during rain, wind, or unusual temperature events); and
- Where samples will be collected. The chemical/physical properties of the pollutant and the source of the pollutant, along with potential sources of liability (discussed in Chapter 3), should be central to determinations of where to collect samples.
Addressing these issues will help reduce potential bias in the ultimate conclusions and promote the quality of the information generated in a project.
Selecting sampling typically involves one of two approaches: (i) random or probabilistic sampling and (ii) judgmental sampling. While each approach has advantages and disadvantages that can be discussed at length, this discussion merely serves to introduce the topics. In random sampling, as its name implies, sampling locations are chosen randomly. It is most useful when the pollutant of interest is relatively homogeneous in the sampling medium (i.e., it is uniformly distributed, and thus, there are no expected “hot spots”). Because citizen science projects concerned with environmental problems often focus on a pollutant source, random sampling may be less commonly used relative to judgmental sampling. Judgmental sampling, as its name implies, involves the selection of sampling locations based on judgment. Judgmental sampling is most useful when there is historical or physical knowledge of the feature or condition under investigation: for example, when the impact of the pollutant can be visually discerned or when the location of pollutant release is known.
Ultimately, the sampling design should match the needs of the project with the resources available (e.g., recognizing constraints of resources related to finances, time, expertise, and geographic access).
A well-designed sample collection methodology helps ensure the precision and accuracy of the information that is ultimately generated. The primary question addressed by a sample collection methodology is: how will samples be collected during each sampling event (e.g., site visit)? The answer to this question may include, among other things, a description of: (i) the number of samples to be collected during each sampling event (i.e., the number of “replica” samples that will be collected); (ii) how samples will be taken; (iii) the equipment and containers used to collect the samples (e.g., their composition and procedures for their decontamination); and (iv) holding time length (i.e., the time between taking samples and analyzing them).
Some aspects of sample collection methodologies are highly generalizable across projects. For example:
- Sample collection should be documented (e.g., time, place, name of collector, equipment used, etc.).
- The collector should wear “a clean pair of new, non-powdered, disposable gloves each time a different location is sampled and the gloves should be donned immediately prior to sampling. The gloves should not come in contact with the media being sampled and should be changed any time during sample collection when their cleanliness is compromised.”
- The collection equipment should be clean and sterilized.
- “Sample collection activities shall proceed progressively from the least suspected contaminated area to the most suspected contaminated area.” Samples that are expected to contain high levels of contaminated media should be kept separate from samples thought to contain low levels of contaminated media.
- “All . . . control samples shall be collected and placed in separate ice chests or shipping containers.”
- “During sample collection, if transferring the sample from a collection device, make sure that the device does not come in contact with the sample containers.”
- “All samples requiring preservation must be preserved as soon as practically possible, ideally immediately at the time of sample collection.”
Other aspects of a project’s sample collection methodology may be specific to the medium being sampled or type of instrument being used. For example, air sample collection methodologies are generally highly specific to the instrumentation used. Water and soil sampling designs, however, have various aspects that are more generalizable.
Water samples should be collected with as little agitation to the water as possible. Wading or streamside sampling increases the probability of agitation. In instances when agitation is a concern, samples should be collected while facing upstream. Moreover, water sample containers should be filled to their capacity (i.e., no bubbles or headspace should be present after the container is capped). Unpreserved and preserved samples have holding times of one week and two weeks, respectively. (Holding times indicate the period during which the samples should be tested.)
Soil samples must be “thoroughly mixed to ensure that the sample is as representative as possible of the sample media;” this rule does not apply if the soil sample will be analyzed for the presence of volatile organic compounds (VOCs). Moreover, the collector should “place the sample into an appropriate, labeled container(s) by using the alternate shoveling method and secure the cap(s) tightly. The alternate shoveling method involves placing a spoonful of soil in each container in sequence and repeating until the containers are full or the sample volume has been exhausted.” Unpreserved samples have a forty-eight-hour holding time.
Sample collection methodologies may also contemplate other ways of documenting sample collection. For example, a methodology could direct volunteers to photograph, videotape, or otherwise record the actual sample collection to demonstrate that the activity complies with the sample collection methodology. Typically, notes of visual and olfactory observations should be recorded in a log book to describe, for example, the depth of each sample, whether its color and texture, any odors, etc. The log can also be used for demonstrating sample handling and custody and any field analyses of the samples.
Precision and accuracy are the main information quality concerns addressed by the establishment of sample handling procedures. These procedures apply to projects that do not perform sample analysis in the field. In these instances, the samples must be transported to an alternative site, such as a laboratory. All samples should be properly labeled including: (i) the sample location; (ii) the date and time of collection; (iii) the sampler’s name; and (iv) whether the sample was preserved, and if so, how. Chain-of-custody procedures should be established to keep track of all samples that will be shipped or transported to a laboratory for analysis (i.e., documentation requirements for any changes in the handler of the sample or the sample’s storage location). This information is important for authentication of any information generated by analysis of the samples (discussed in Chapter 2).
Analysis of samples may occur in the field or in a laboratory. In either case, the analytical methods and equipment used in the analysis should be documented. For example, if an EPA Reference Method or approved protocol is used, the method/protocol number should be listed; if the methodology differs from the Reference Method or approved protocol, list the ways in which it differs. In addition, documentation of instrumental calibration, inspection and maintenance should be provided. These procedures promote precision and accuracy of the data.
Generally, analytical tools that are EPA approved are documented in the Federal Register. In some instances, the EPA provides lists of analytical tools that are EPA-approved when used in specific contexts.  Other EPA approved devices can be found in EPA-approved operating procedures or reference methods (see Appendix 5).
The design of a project should include methods for collecting and testing quality control samples; examples include field controls, equipment controls, split samples, replica samples, and spiked samples.
- A field control is a sample “collected” in the field that lacks a detectable quantity of the analyte of interest (i.e., the pollutant). While regular sample containers are filled with air, water, or soil from the field, a field control is filled in the same way but with air, water or soil with a known composition that is brought to the site. If preservation steps are performed to the field samples, they should likewise be performed on the field control sample.
- Equipment controls are samples used to verify the cleanliness of sample collection or analysis equipment. Generally, distilled water is used to test equipment’s cleanliness.
- A split sample is one that is divided into two or more sample containers and subsequently analyzed independently.
- Replica samples or duplicate samples are samples that are collected and analyzed at the same time and in tandem (i.e., they are representative of the same environmental condition).
- Spiked samples are samples to which a known amount of the analyte has been added.
Because the abundance of the analyte (i.e. pollutant) is known in each of these control samples, they are useful in assessing the precision and accuracy of the data that is ultimately generated.
The project design should include considerations of how the data generated from sample analysis will be interpreted. It is from this interpretation that conclusions will be drawn. In some instances, you, the citizen scientist, may be able to interpret the data. However, as mentioned in Chapter 2, some uses of information generated from your project will require expert interpretation. When data is interpreted by a qualified expert, the quality of the information is enhanced. There are likely to be qualified experts in your community who are willing to assist you. Think about universities, community colleges, high schools, and locally-based environmental engineering companies.
 Minn. Stat. Ann. §114D.
 See CIO §2105.0
 See Environmental Protection Agency, Citizen Science QAPP Template (April 2013); Environmental Protection Agency, Guidance for Choosing a Sampling Design for Environmental Data Collection, EPA/240/R-02/005 (December 2002); Environmental Protection Agency, The Volunteer Monitor’s Guide to Quality Assurance Project Plans, EPA 841-B-96-003 (September 1996).
 Environmental Protection Agency, Collection of Methods, https://www.epa.gov/measurements/collection-methods (last visited May 1, 2017).
 See, e.g., Environmental Protection Agency, SESD Operating Procedure: Soil Sampling, SESDPROC-300-R3 (August 2014); Environmental Protection Agency, SESD Operating Procedure: Surface Water Sampling, SESDPROC-201-E3 (February 2013); Environmental Protection Agency, SESD Operating Procedure: Pore Water Sampling, SESDPROC-513-R2 (February 2013); Environmental Protection Agency, SESD Operating Procedure; Groundwater Sampling, SESDPROC-301-R3 (March 2013).
 See, e.g., Environmental Protection Agency, List of Designated Reference and Equivalent Methods (June 2016).
 See Operating Procedure: Soil Sampling supra, note 7.
 See, e.g., Environmental Protection Agency, List of Designated Reference and Equivalent Methods (June 2016).
Please note that this discussion is not moderated by the Emmett Environmental Law & Policy Clinic.