Chapter 3 - Research Method

Chapter 3: Research Method

Chapter 3: Research Method

Research Design and Rationale

In order to address the purpose of the study, I developed the following three main research questions:

RQ1: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers?
RQ2: Is there a significant difference between the number of individual patient records affected per digital breach than per nondigital breach for health plan providers?
RQ3: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses?

In this study, I used a quantitative method with a comparative design. Quantitative methods are used to measure variables or data numerically and objectively and make use of statistical techniques to analyze the underlying difference or between and among variables or differences between groups based on some variable (Mustafa, 2011). Quantitative methods are used to deduce insights from numerically measured and statistically tested data in the hope of generalizing the findings to a larger population (Allwood, 2012). Therefore, I used a quantitative methodology to enable the determination of differences in the number of affected individuals between the types of information security breaches (i.e., digital and nondigital) for each of the three types of health care entities in the U.S. health care system (i.e., health care providers, health plan providers, and health care clearinghouses).

I used a comparative research design in this study. A comparative research design is used to attempt to determine the differences that already exist between or among groups of individuals (Gall et al., 2010). This study involved the analysis of individual patient records affected for two types of breaches and three types of health care entities, which made a comparative design appropriate.

A comparative design has been used to help to advance knowledge and contribute to the body of literature in other similar studies. Rice et al. (2018) examined the gender differences that existed in privacy concerns individuals had about unmanned aerial systems (colloquially called drones). Like the current study, Rice et al. used quantitative data to examine the differences that existed between two or more groups. While Rice et al. examined the difference in the Likert score for drone mistrust in men and women, in the current study I examined the differences in the number of individuals affected by data breaches by type of breach or type of organization breached.

Başaran and Hama (2018) used a comparative design to compare and contrast faculty members’ views towards cloud computing adoption in higher education. They used descriptive statistics and an independent t test to demonstrate that regional differences existed in the adoption of cloud computing. Similar to the current study, Başaran and Hamma examined quantitative data to ascertain differences across groups. While the current study differs from both Rice et al.’s (2018) and Başaran and Hamma’s studies in that it examined the number of affected individuals rather than Likert scores, the methodology employed was similar in all three studies.

Methodology

This study involved the use of historical data available in the public database of HIPAA breach and violations (breach portal) maintained by the U.S. Department of Health and Human Services (HHS). I began the data collection upon receiving approval from the Walden University Institutional Review Board.

Population

The archival section of the breach portal contains a total of 2,441 data breach reports from 2009 to February 14, 2018, which is 24 months prior to the present day. Reports less than 24 months old are housed in a separate section of the portal because they are considered still under investigation. The total population of closed data breach reports at the time of this study was 2,441 reports.

Sampling and Sampling Procedures

This study involved the use of reports beginning in 2010 to the most recent report available in the archival section of the breach portal, which was about 2,601 reports. I did not consider newer data because the breaches were still under investigation and the reports were subject to change. Older data would not work due to the rapid pace of technological advancement. Data older than 2015 might not be relevant to the modern day due to improvements in technology and data security. I used 100% of the total population within the stated timeframes in the current study, so sampling was unnecessary.

Archival Data

HHS’s breach portal is a publicly available database that can be accessed and downloaded online. No permission was needed to access the data because it is freely available to anyone who wants to use it. HHS investigates the reports submitted by individual entities and posts the results of their findings in the data breach portal. The portal includes the following information: name of the breached entity, state the entity resides in, entity type, individuals affected, breach submission date, type of breach, location of breached information, if a business associate is present, and a text description of the incident. For this study, I obtained data from the HHS breach portal in the form of a Microsoft Excel document and truncated the unnecessary data (i.e., name of the entity, date of the breach, state the entity resides in, and if a business associate is present).

Operationalization of Variables

I used statistical pacakage for the social sciences (SPSS) to analyze the study data to address the primary research questions in this study. To test the hypotheses for each of the research questions, I ran a two-sample t test for the difference in the means of two independent variables. If the t test indicated that there was no statistical difference between the two means, I failed to reject the null hypothesis. If there was a statistically significant difference in the two means, I rejected the null hypothesis in favor of the alternative hypothesis.

Threats to Validity

In this section, I consider threats to validity that may have affected the current study. These threats include those related to external validity, internal validity, and construct validity as well as ethical considerations. I addressed these issues before data collection and throughout the study process.

External Validity

I conducted this study in the United States using data from the HHS breach portal. While data breaches in the United States may be similar in nature to data breaches in other countries, it is safer to assume that the findings presented in the current study relate only to the United States. Because data from the data breach portal are from all 50 states, the findings of this study should apply to all states within the United States. However, while many cybercrimes are federal offenses, states also have corresponding or supplementary state laws related to cybercrime (Jarrett et al., 2009). This may mean that some states are more or less susceptible to cybercrime. I have not taken this into account in the current study.

Internal Validity

While I used the entire breach data set available from the HHS breach portal, there is still a possibility of selection bias. The data contained in the breached database only includes incidents that affected 500 or more individuals. It is possible and likely that there are many more breaches that were either ignored or that affected less than 500 individuals. Therefore, the study results only apply to breaches that are both eventually caught and affect more than 500 individuals.

Additionally, in a few instances, I needed to make a decision whether to include or exclude a breach report. A breach report exclusion occurred when I could not determine if the breach is the result of a digital or paper breach. In these cases, personal bias may have affected the choices I made despite my efforts to avoid such bias.

In this study, I did not consider the date of the breach in the analysis of the results. It is possible that as technology shifts, types of data breaches may become more or less relevant. Therefore, this study is most applicable to the present day and may not remain applicable as technology and general practices change.

As this research involved the use of an open-access database, I had no control over how the data were collected or reported. While the database was from a reliable source (i.e., the HHS), I was not able to account for any errors or bias that may have occurred while the data were being collected or reported.

Construct Validity

I used a t test to evaluate the study hypotheses for each research question. In order to correctly use a t test, the data must be random and normally distributed (Frost, 2019). Where this was not the case, I identified the most appropriate statistical test to use.

Ethical Procedures

I did not use any human subjects in this study, meaning that an extensive institutional review board review was not necessary. Furthermore, in this study I only used information that was freely available online; therefore, the need for the confidentiality of participants did not apply to this research. I did not need to ensure the informed consent of participants because the study used archival data. However, despite not needing to protect the privacy of participants due to the public nature of the data, I did not publish the name of any company that experienced a data breach as identified in the breach portal. The names of the organizations were not pertinent to the study, so while the types of organizations that experienced a data breach were an integral part of the study, the name or identifying information of any particular company or organization was not included.

Summary

The purpose of this quantitative, comparative study was to determine if there is a significant difference in the number of individual patient records affected between digital and nondigital breaches of individual patient records for each of the three types of U.S. health care entities in the United States. nondigitalnondigitalnondigital In this study, I used a quantitative method with a comparative design. Quantitative methods are used to measure variables or data numerically and objectively and make use of statistical techniques to analyze the underlying difference between groups based on some variable (Mustafa, 2011). Quantitative methods are also used to deduce insights from numerically measured and statistically tested data in the hope of generalizing the findings to a larger population (Allwood, 2012). Therefore, I used a quantitative methodology to enable the determination of differences in the number of affected individuals between the types of information security breaches (i.e., digital and nondigital) for each of the three types of health care entities in the U.S. health care system (i.e., health care providers, health plan providers, and health care clearinghouses).

The major threat to validity in this study was that the data contained in the breached database only included incidents that affected 500 or more individuals. It is possible and likely that there are many more breaches that either occurred and were ignored or that affected less than 500 individuals. Therefore, the study results only apply to breaches that are both eventually reported and affect more than 500 individuals. They are not generalizable to smaller breaches.