Study Guide for Data Analysis Fundamentals
This guide provides a review of key concepts in data analysis, including foundational terms, processes, tools, and career paths. It is designed to test and reinforce understanding of the core material.
Quiz: Short-Answer Questions
Instructions: Answer the following ten questions in two to three sentences each, based on the provided source material.
- What is the core difference between data analysis and data analytics?
- List the six stages of the common data life cycle and briefly describe the "Manage" stage.
- According to the provided text, what is a "technical mindset," and how does it apply to data analysis?
- Identify two primary functions of spreadsheets in data analysis and name one popular spreadsheet application.
- What is a query language, and what is its primary purpose in relation to databases?
- Explain the concept of a "data ecosystem."
- How does the data life cycle used by the U.S. Fish and Wildlife Service differ from the one used by Harvard Business School (HBS)?
- Describe the key distinction between a data analyst and a data scientist in terms of their approach to problem-solving.
- What are two popular data visualization tools mentioned in the text, and what is a unique feature of each?
- When creating a data visualization, what are the first two steps a data analyst should follow?
--------------------------------------------------------------------------------
Answer Key
- What is the core difference between data analysis and data analytics? Data analysis is the practical process of collecting, transforming, and organizing data to draw conclusions, make predictions, and drive decision-making. In contrast, data analytics is a broader field, defined as the science of data.
- List the six stages of the common data life cycle and briefly describe the "Manage" stage. The six stages are Plan, Capture, Manage, Analyze, Archive, and Destroy. The "Manage" stage involves caring for and maintaining the data, which includes determining how and where it is stored and what tools are used for its storage.
- According to the provided text, what is a "technical mindset," and how does it apply to data analysis? A technical mindset is the ability to break things down into smaller steps or pieces and work with them in an orderly and logical way. This is a crucial skill in data analysis, which involves a step-by-step process of identifying a problem and solving it using organized data.
- Identify two primary functions of spreadsheets in data analysis and name one popular spreadsheet application. Spreadsheets allow data analysts to collect, store, organize, and sort information, as well as create data visualizations like graphs and charts. Microsoft Excel and Google Sheets are two popular spreadsheet applications used by analysts.
- What is a query language, and what is its primary purpose in relation to databases? A query language is a computer programming language, such as SQL, used to communicate with a database. Its primary purpose is to allow analysts to make requests (queries) to isolate, select, create, add, or download specific information from a database for analysis.
- Explain the concept of a "data ecosystem." A data ecosystem comprises the various elements that interact with one another to produce, manage, store, organize, analyze, and share data. These elements can include people, processes, and tools involved in the data's journey.
- How does the data life cycle used by the U.S. Fish and Wildlife Service differ from the one used by Harvard Business School (HBS)? The U.S. Fish and Wildlife Service's life cycle (Plan, Acquire, Maintain, Access, Evaluate, Archive) emphasizes long-term storage with an "Archive" stage. The HBS life cycle (Generation, Collection, Processing, Storage, Management, Analysis, Visualization, Interpretation) focuses more on research and teaching by including stages for "Visualization" and "Interpretation" and notably does not explicitly include a stage for destroying data.
- Describe the key distinction between a data analyst and a data scientist in terms of their approach to problem-solving. A data analyst typically uses existing tools and methods to solve problems with existing types of data. In contrast, a data scientist often invents new tools and models, asks open-ended questions, and collects new types of data to make business predictions.
- What are two popular data visualization tools mentioned in the text, and what is a unique feature of each? Tableau and Looker are two popular visualization tools. Tableau features a simple drag-and-drop interface that allows users to create interactive graphs in dashboards. Looker is notable for communicating directly with a database, which allows data to be connected right to the chosen visualization tool.
- When creating a data visualization, what are the first two steps a data analyst should follow? The first step is to explore the data for patterns, which involves reviewing data sources like sales records and analytics reports to find interesting trends. The second step is to plan the visuals by refining the data and deciding what insights to present to the target audience, such as showing sales numbers over time or connecting sales to location.
--------------------------------------------------------------------------------
Essay Questions
Instructions: Consider the following questions. Formulate a comprehensive response that synthesizes information from across the source material.
- The source context describes several variations of the data life cycle (e.g., U.S. Fish and Wildlife, USGS, Financial Institutions, HBS). Analyze these variations and explain how the specific stages in each cycle reflect the unique priorities and goals of that organization or industry.
- A company is deciding whether to use spreadsheets or a database for a new data project. Based on the "RIGHT TOOL FOR THE JOB" table, construct an argument outlining the conditions under which spreadsheets would be the superior choice and the conditions under which a database would be necessary.
- The text distinguishes between several roles: data analyst, data scientist, and data specialist. Using the information provided, create a scenario for a business problem and explain how individuals in each of these three roles would contribute differently to its solution.
- Imagine being tasked with analyzing website and sales data for an e-commerce company to guide a website redesign. Outline a complete plan, from initial data exploration to the final presentation of visuals, incorporating the steps for planning a data visualization and the different types of tools (spreadsheets, Tableau, RStudio) that could be used at each stage.
- The source provides a list of assessment-taking strategies. Explain how concepts like "analytical thinking" and a "technical mindset" can be applied to these strategies to improve performance on a data analytics assessment.
--------------------------------------------------------------------------------
Glossary of Key Terms
Term | Definition |
Analytical skills | Qualities and characteristics associated with using facts to solve problems. |
Analytical thinking | The process of identifying and defining a problem, then solving it by using data in an organized, step-by-step manner. |
Context | The condition in which something exists or happens. |
Data | A collection of facts. |
Data analysis | The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making. |
Data analyst | Someone who collects, transforms, and organizes data in order to draw conclusions, make predictions, and drive informed decision-making. |
Data analytics | The science of data. |
Data design | How information is organized. |
Data ecosystem | The various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data. |
Data science | A field of study that uses raw data to create new ways of modeling and understanding the unknown. |
Data strategy | The management of the people, processes, and tools used in data analysis. |
Data visualization | The graphical representation of data. |
Data-driven decision-making | Using facts to guide business strategy. |
Database | A collection of data stored in a computer system. |
Dataset | A collection of data that can be manipulated or analyzed as one unit. |
Formula | A set of instructions used to perform a calculation using the data in a spreadsheet. |
Function | A preset command that automatically performs a specified process or task using the data in a spreadsheet. |
Gap analysis | A method for examining and evaluating the current state of a process in order to identify opportunities for improvement in the future. |
Query | A request for data or information from a database. |
Query language | A computer programming language used to communicate with a database. |
Root cause | The reason why a problem occurs. |
Spreadsheet | A digital worksheet. |
SQL | (Refer to Structured Query Language). |
Stakeholders | People who invest time and resources into a project and are interested in its outcome. |
Structured Query Language | A computer programming language used to communicate with a database. |
Technical mindset | The ability to break things down into smaller steps or pieces and work with them in an orderly and logical way. |
Visualization | (Refer to data visualization). |
