May 8, 2024
The illusion of moral decline – Nature

The illusion of moral decline – Nature

Study 1

In study 1, we conducted keyword-term searches of the Roper Center for Public Opinion iPoll Database, and manually searched the databases of the General Social Survey, Pew Research Center, Gallup, the American National Election Studies, the World Values Survey, the European Social Survey and the European Values Survey to locate survey items that asked participants if and how they thought other people’s morality had changed over time. In our analyses, we included all surveys that (1) used a representative sample of US American participants, and (2) explicitly asked participants about their perceptions of changes in values, traits and behaviours that have traditionally been taken as indicators of morality by a wide range of US Americans (for example, kindness, honesty, respect). We excluded from our analyses items that asked participants about their perceptions of special topics whose moral relevance either changed considerably over time (for example, men holding doors for women) or differed substantially across members of the population (for example, attending church). We also excluded items that asked participants about the morality of special subpopulations (for example, ‘Evangelicals’ or ‘the Wisconsin legislature’) rather than about all US Americans or about people in general. Further information, including search terms and all survey items included in study 1, can be found in the Supplementary Information. We also sampled our database for survey items administered to participants who lived outside the United States. Because there were fewer such surveys, we did not exclude surveys with non-representative samples, as we did with our US sample.

Study 2a

All original data collection in this and subsequent studies followed all ethical regulations and was approved by the Institutional Review Board of Harvard University.

Participants

We recruited a nationally representative sample of US American adults using Prolific, an online sample provider. This sample was constructed to represent the US American adult population in terms of gender, race and age. Because we did not know the size of the effect we were studying, we sought to make our sample comparable in size to the samples in study 1 by recruiting 1,000 participants. Nine-hundred and ninety-nine people (507 female, 487 male, 5 other, Mage = 45.74 years, 73% white, 13% Black, 7% Asian, 4% Hispanic, 1% American Indian or Alaska Native, 1% other, 2% ‘more than one of the above’) were paid US$0.75 each for their participation.

Procedure

Study 2a was conducted in 2020. After providing informed consent, participants confirmed their Prolific ID, per the site’s usage policy. They then read the following instructions: “Thanks! In this study, we’ll ask you how kind, honest, nice, and good people were at various points in time. If you’re not sure, that’s okay, just give your best guess”. Participants then rated how “kind, honest, nice, and good” people are today, were 10 years ago and were 20 years ago, using seven-point Likert scales with endpoints labelled ‘not very’ and ‘very’. As a consistency check, participants were then asked to recall whether they had given higher, equal or lower ratings to people today compared to people 20 years ago. Participants then answered some open-ended exploratory questions that asked them to explain the thinking behind their answers. Participants then answered some demographic questions (Supplementary Table 6). Embedded among these demographic questions was an ‘attention check question’ that instructed participants to select the option ‘other’ and to type the word ‘sky’. Finally, participants were compensated and dismissed.

Exclusions

One hundred and eighty-one participants failed the attention check embedded in the demographics and were excluded from all analyses. Another 120 participants gave answers to the consistency check question that were inconsistent with their previous answers; they were also excluded. This left 698 participants in all analyses (372 female, 322 male, four other, Mage = 46.37, 74% white, 12% Black, 6% Asian, 4% Hispanic, 1% American Indian or Alaska Native, 2% more than one of the above). These exclusions do not meaningfully affect the results.

Analysis

To analyse the data, we fit a linear mixed effects model using the lme4 package in R30, extracted P values using the lmerTest package31 and calculated planned contrasts using the emmeans package32, using a Holm–Bonferroni correction for multiple comparisons. The outcome was participants’ ratings and the predictor was the year of those ratings (one factor with three levels: 2020, 2010 and 2000). The model included a fixed effect of the year of each rating and a random intercept for each participant. For this and all models, we checked model assumptions by plotting the outcome variable, residuals and fitted values. All tests we report are two-tailed.

Study 2b

Participants

We powered study 2b to detect an effect of d = 0.30 or larger, reasoning that this would be sufficient to detect effects similar to the effect we detected in Study 2a. Two-hundred and thirty-six people responded to an advertisement for a study on Amazon Mechanical Turk. To participate, respondents had to pass a three-item test that required them to know that (1) children in kindergarten are 3 or 4 years old, (2) a US American ZIP code is a series of five digits and (3) eating turkey is not associated with Halloween. Thirty-six respondents answered at least one of these three questions incorrectly and were not allowed to participate. The remaining 200 respondents (81 female, 119 male, Mage = 35.81 years, 72% white, 12% Black, 9% Hispanic, 6% Asian, 3% more than one of the above) were allowed to participate in the study in exchange for US$0.75.

Procedure

After providing informed consent, participants followed study 2a’s procedure except they were asked about different years. Specifically, participants were first asked, “How kind, honest, nice, and good are people today?” and were then asked the same question for “two years ago”, “four years ago”, “six years ago”, “eight years ago” and “ten years ago”, in that order. All questions were answered using a seven-point Likert scale with endpoints labelled ‘not very’ and ‘very’. As a consistency check, participants then answered the following question: “When it comes to being kind, honest, nice, and good—are people more so today compared to ten years ago, less so today compared to ten years ago, or the same?” Participants were then asked to explain their answer in an open-ended question. Finally, participants were asked some demographic questions, as well as an attention check question that required them to select the option ‘other’ and to type the word ‘day’. Participants were compensated and dismissed.

Exclusions

Fifteen participants failed the attention check, and a further 37 participants failed the consistency check by giving an answer that was inconsistent with their scale ratings. The data from these participants were excluded from all analyses, leaving 148 participants (59 female, 89 male, Mage = 36.59 years, 75% white, 9% Black, 7% Hispanic, 5% Asian, 1% Hawaiian or Pacific Islander, 3% more the one of the above). These exclusions only meaningfully affect the results in one case, namely, that when all participants are included, the difference between 2020 and 2016 is not significant.

Analysis

We fit the same model we fit in study 2a except that in this case the factor in the model had six levels (2020, 2018, 2016, 2014, 2012 and 2010).

Study 2c

Participants

We sought to recruit a sample of people who varied widely in terms of age. As such, we created a survey with a quota of 50 participants in each of the following age groups: 18–24, 25–29, 30–34, 35–39, 40–44, 45–49, 50–54, 55–59, 60–64 and 65–69 years. This sample size gave us sufficient power to detect the effects we had detected in studies 2a and 2b. Respondents selected their age group on accessing the study, and once the quota for a group was reached, further respondents from that group were not allowed to participate. Respondents younger than 18 or older than 69 were not allowed to participate.

Respondents responded to an advertisement for a study on Amazon Mechanical Turk. Respondents who accessed the survey before the quota for their age group was reached were asked to complete a three-item test of English proficiency and knowledge of US American culture. Specifically, they were required to demonstrate that they knew that (1) bell bottoms are not a type of footwear, (2) an RSVP is a required response to a wedding invitation and (3) a sign reading ‘out of order’ is best paired with an elevator. Three hundred and one respondents answered one or more of these questions incorrectly and were not allowed to participate. The remaining 484 respondents (225 female, 257 male, two other, Mage = 41.27 years, 72% white, 15% Black, 7% Asian, 4% Hispanic, 1% American Indian or Alaska Native, 2% more than one of the above) were allowed to participate in the study in exchange for US$0.75.

Procedure

Study 2c was conducted in 2020. Participants responded to an advertisement for a study on Amazon Mechanical Turk. After providing informed consent, participants reported how “kind, honest, nice and good” people are today. They then reported how “kind, honest, nice and good” people were when they (the participants) were about 20 years old, and at about the time they (the participants) were born. This was done by adjusting the wording of the subsequent questions on the basis of the participant’s age. For example, if the participant was between 30 and 34 years old, they were asked “How kind, honest, nice, and good were people about ten years ago?” and then “How kind, honest, nice, and good were people about 30 years ago?” If participants were under 25 years, they answered only the questions for today and when they were born. All questions were answered using a seven-point Likert scale with endpoints labelled ‘not very’ and ‘very’. As in previous studies, participants were then given a consistency check that required them to remember whether they had rated people today as more, equally or less moral compared to people in the year they were born. Participants then answered some further exploratory and demographic questions. Embedded among them was an attention check that required participants to select the option ‘other’ and type the word ‘apple’. Finally, participants were compensated and dismissed.

Exclusions

Twenty-eight participants failed the attention check and their data were excluded from all analyses. Seventy-three more participants reported an age at the end of the study that was inconsistent with the age group they selected at the beginning of the study and the data from these participants were also excluded from all analyses. An extra 64 participants failed the consistency check and data from these participants were also excluded from all analyses. The data from the remaining 347 participants (174 female, 172 male, one other, Mage = 42.57 years, 78% white, 9% Black, 7% Asian, 4% Hispanic, 2% ‘more than one of the above’) were included in all analyses. These exclusions do not meaningfully change the results.

Analysis

We fit the same model we fit in study 2b except that in this case the factor in the model had three levels (today, the year the participant turned 20, the year the participant was born).

Study 3

Participants

Respondents responded to an advertisement for a study on Amazon Mechanical Turk. As in study 2c, we sought to recruit a sample of people who varied widely in terms of age and that was large enough to provide sufficient power to detect the effects we had detected in studies 2a and 2b. We created a survey with quota of 150 for each of three age groups: 20–34, 35–49 and 50–64. Anyone younger than 20 or older than 64 was not allowed to participate. Respondents were asked to complete the same test of English language and US American culture as in study 2c. Four hundred and forty-four respondents (202 female, 242 male, Mage = 40.42 years, 77% white, 9% Black, 7% Asian, 5% Hispanic, 1% ‘more than one of the above’) provided informed consent and became participants in the study in exchange for US$0.75.

Procedure

Study 3 was conducted in 2020. After providing informed consent, participants reported how “kind, honest, nice, and good” people are in the present (2020) and also “about 15 years ago” (about 2005) on seven-point Likert scales with endpoints labelled ‘not very’ and ‘very’ and then completed a consistency check that asked them to recall the answers they had just given. The difference between these two ratings was used as a measure of participants’ perception of moral decline between 2005 and 2020. Participants then answered the following questions using the same seven-point Likert scales: “How kind, honest, nice, and good are people who are currently between the ages of 35 and 95?”; “How kind, honest, nice, and good are people who are currently between the ages of 20 and 35?”; “Thinking again of people who are currently between the ages of 35 and 95, how kind, honest, nice, and good were they about 15 years ago?” and “About 15 years ago, how kind, honest, nice, and good were people who were then between the ages of 80 and 95?” Participants then answered some demographics questions, among which was embedded an ‘attention check question’ that instructed participants to select the option ‘other’ and to type the word ‘cloud’. Finally, participants were compensated and dismissed.

Exclusions

Forty-eight participants failed the attention check, and a further 15 participants reported an age at the end of the study that was inconsistent with the age group they reported at the beginning of the study. An extra 77 participants failed the consistency check. The data from all of these participants were excluded from all analyses, leaving 319 participants (154 female, 165 male, Mage = 41.02, 77% white, 8% Black, 8% Asian, 5% Hispanic, 1% more than one of the above). These exclusions do not meaningfully affect the results.

Calculating personal change and interpersonal replacement

We created a personal change score by subtracting ratings of 20–80-year olds about 15 years ago (in 2005) from ratings of 35–95-year olds in the present (2020). We created an interpersonal replacement score by subtracting ratings of 80–95-year olds about 15 years ago (in 2005) from ratings of 20–35-year olds in the present (2020). The descriptive statistics for people in general and each of the subgroups about which participants were asked are shown in Extended Data Fig. 1.

Analysis

Using a standard linear model, we entered participants’ personal change and cohort replacement scores as predictors, and the outcome was participants’ overall perception of moral decline between 2005 and 2020.

Study 4

In study 4, we conducted keyword-term searches of the Roper Center for Public Opinion iPoll Database (using search terms shown in the Supplementary Information), and manually searched the databases of the General Social Survey, Pew Research Center, Gallup, the American National Election Studies, the World Values Survey, the European Social Survey and the European Values Survey to locate survey items that asked participants questions about their own and other people’s morality. As in study 1, questions were considered relevant to morality if they asked about values, attitudes, traits and behaviours that we thought would be considered relevant to kindness, honesty, niceness and goodness by a wide range of US Americans. We included US samples only if they were nationally representative, but also collected non-representative samples if they were collected outside the United States to maximize non-US representation. The latter were analysed separately. To be included, each survey had to be administered at least twice, and the most recent administration could not be earlier than 2010. Further information, including search terms and all survey items included in study 4, can be found in the Supplementary Information.

Analysis

We fit a linear model for each survey. The year of each survey was always entered as a predictor and the outcome was always the average perception of current morality. We used R2 values as a measure of effect size. We fit Bayesian models using the Rstanarm package in R33 and extracted the percentage of the 89% HDI that was contained in the ROPE, which was by default defined as ±0.1 standard deviations. We used the package’s default Markov Chain Monte Carlo and prior settings (M = 0, scale of 2.5).

Study 5a

Participants

As in study 2c, we sought to recruit a sample of people who varied widely in terms of age and that was large enough to provide sufficient power to detect the effects we had detected in previous studies. We created a survey with a quota of 50 participants in each of three age groups: 20–34, 35–49 and 50–64 years. Anyone who was either younger than 20 years or older than 64 years was not allowed to participate.

One thousand and twenty-one people responded to an advertisement for a study on Amazon Mechanical Turk. They completed the same test of English language and US American culture as in study 2c. Five hundred and twenty-one respondents answered at least one of the questions incorrectly and were not allowed to participate. The remaining 500 respondents (204 female, 293 male, three other, Mage = 37.74 years, 65% white, 24% Black, 7% Asian, 2% Hispanic, 1% American Indian or Alaska Native, 1% more than one of the above) provided informed consent and became participants in the study in exchange for US$0.75.

Procedure

Study 5a was conducted in 2021. After providing informed consent, participants completed the same procedure as was used in study 2c, with two more questions. Specifically, participants rated how “kind, honest, nice, and good” people in general were 20 years before the participant was born and also 40 years before the participant was born. These years were adjusted on the basis of the age of the participant.

Exclusions

One hundred and seventy-nine participants failed the first attention check, and another 21 failed the second attention check. Another 15 participants reported an age at the end of the study that was inconsistent with the birth year they reported at the beginning. The data from all these participants were excluded from all analyses. The remaining 283 participants (139 female, 143 male, one other, Mage = 38.77 years, 78% white, 11% Black, 8% Asian, 2% Hispanic, 1% more than one of the above) were included in all analyses. These exclusions affect the results in a few cases. Specifically, when excluded participants are included, the overall perception of moral decline and personal change for people in general are not significant. All other effects remain significant.

Analysis

We fit the same model we fit in study 2c except that the factor in the model had five levels (2020, the year the participant turned 20, the year the participant was born, 20 years before the participant was born and 40 years before the participant was born).

Study 5b

Participants

Because this study was a replication and extension of study 2c, we sought to collect a similar sample size to have the power to detect similar effects, and we used the same age quotas as in Study 2c. One thousand eighty-two people responded to an advertisement for a study on Amazon Mechanical Turk. Twenty-one of these opened the study but did not complete it. Five hundred and sixty people responded after the quota for their age group had been reached and were not allowed to participate in the study. Respondents who responded before the quota for their age group was reached completed the same three-item test of US American culture and English language used in study 2c. Twenty-three respondents answered one or more of these questions incorrectly and were not allowed to participate in the study. The remaining 499 respondents (225 female, 241 male, three other, Mage = 43.96 years, 78% white, 10% Asian, 5% Black, 4% Hispanic, 3% more than one of the above) were allowed to participate in the study in exchange for US$0.75.

Procedure

Study 5b was conducted in 2021. After providing informed consent, participants completed the same procedure used in study 2c. They further rated people’s morality 20 and 40 years before the year that they were born.

Exclusions

Forty-four participants failed the attention check and their data were excluded from all analyses. Seven more participants reported an age at the end of the study that was inconsistent with the age group they selected at the beginning of the study and their data were also excluded from all analyses. Sixty-one more participants failed the consistency check and their data were also excluded from all analyses. The data from the remaining 387 participants (206 female, 178 male, three other, Mage = 44.04 years, 79% white, 11% Asian, 4% Black, 3% Hispanic, 2% more than one of the above) were included in all analyses. These exclusions affect the results in one case: when excluded participants are included, participants perceived moral improvement from 40 years before birth to 20 years before birth. All other effects remain the same.

Analysis

We fit the same model we fit in study 2c except that in this case the factor in the model had five levels (the year 2020, the year the participant turned 20 years old, the year the participant was born, 20 years before the participant was born and 40 years before the participant was born).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link