Psychopharmacology and Pregnancy: Treatment Efficacy, Risks, and Guidelines 2014

3. Critical Evaluation of the Literature: Understanding the Complexities of Observational Research

Adrienne Einarson 


The Hospital for Sick Children, Toronto, ON, Canada, M5G 1X8

Adrienne Einarson


3.1 Introduction

3.2 Types of Studies

3.2.1 The Case Report

3.2.2 Case Series

3.2.3 Prospective Comparative Cohort Studies

3.2.4 Case–Control Studies

3.2.5 Meta-analysis

3.3 Administrative Databases

3.3.1 National Birth Registries

3.4 Evaluating Studies

3.5 Understanding the Results of the Statistical Analysis

3.5.1 Statistical Tests

3.6 Evaluation of the Abstract

3.7 Questioning the Validity of the Results

3.8 Selective Bias in Reviews of the Literature

3.9 Does Research Design Affect Results?

3.10 Conclusion



Studying the safety of drugs used in pregnancy, especially psychotropics, is a complicated process with currently no ideal model for conducting studies. Because of the ethical issues surrounding pregnancy it is highly unlikely Randomized Controlled Trials (RCTs) will ever be conducted. Consequently, observational studies are used and all of the models have their limitations, such as small sample size, retrospective bias, inability to know exactly if the women took their medication in pregnancy, and other missing data. However, this does not mean that the data collected are not valuable and useful to provide evidence-based information. Psychopharmacology and pregnancy is a highly sensitive topic with healthcare providers, with the media being very interested and eager to report findings, especially if adverse effects are reported. Consequently, it is of great importance when evaluating these studies, to point out the limitations of each study and how it may affect the results. For best evidence, a combination of these different types of observational studies will assist women and their healthcare providers in making informed decisions as to whether or not to take a particular drug during pregnancy.


Observational studiesModelsExaminationKnowledge

3.1 Introduction

Since the advent of evidence-based medicine, students in all medical disciplines are expected to learn the skills necessary to allow them to keep up to date with the continuous changes in the field and to practice state-of-the-art medicine. However, this is not as easy as it may appear, as critical evaluation of the literature is no simple task and most clinicians do not have the background in epidemiology and statistics to fully understand the complex nature of what they have read. Consequently, information on treatments, such as an antidepressant medication, may or may not be given appropriately to a patient, perhaps causing harm to the individual, which is definitely not the anticipated outcome (Einarson 2010).

It is possible that clinicians can critically evaluate the literature appropriately if they have enough basic knowledge to be able to know, for example, the difference between statistical significance and clinical relevance of the results. In fact, this is one of the most important pieces of information to understand, as significant (or positive) results are easier to get published, whether they have clinical importance or not (Easterbrook et al. 1991; Gluud 2006).

3.2 Types of Studies

Before critically evaluating the literature, the reader must have comprehensive knowledge regarding the different types of studies that are used in observational research, which is the methodology used in the published literature involving pregnant women. The randomized controlled trial (RCT) is considered the highest form of evidence, but it is not ethical to conduct such trials in pregnant women due to the unknown risk to the developing fetus. The following are different types of studies ranging from the lowest to the highest evidence in observational studies.

3.2.1 The Case Report

A case report is a signal generator, which may identify a potential problem and can prompt a more formal investigation. However, the main limitation of case reports is that they cannot determine causation, unless many other cases describe the same defect with the same exposure. The most classic example in pregnant women was the case report regarding thalidomide, where Dr W McBride reported in the Lancet that he had seen several cases of exposure to thalidomide which had resulted in polydactyly, syndactyly, and failure of development of long bones (abnormally short femora and radii). He then asked the question, “have any of your readers seen similar abnormalities after taking this drug?” (McBride 1961). We now know that there were many other case reports and the study of teratology began.

In those more than 40 years, only one other drug has caused major abnormalities in a large percentage of cases (50 %) to an infant exposed during pregnancy which is Accutane® (isotretinoin). However, because of the heightened awareness due to the thalidomide tragedy, it was known to be a teratogen in a much shorter time than thalidomide, due to the many case reports published in the literature, reporting on infants exposed in utero to the drug who exhibited the same pattern of malformations. Consequently, very quickly, guidelines were put in place to prevent women from taking this drug during pregnancy (Pochi et al. 1988).

3.2.2 Case Series

These are usually more than one case and could be hundreds, occasionally thousands as in some drug company pregnancy registries. They can be presented as cases of exposure or cases of outcome. However, the main limitation of these studies is that there is no comparison group to examine variables, which may affect outcomes. On the other hand, these reports can be useful if the drug is new on the market and there is no information at all regarding safety in pregnancy in the literature (Yaris et al. 2004).

3.2.3 Prospective Comparative Cohort Studies

Studies of this type are most commonly used when examining the safety of drug exposures in pregnancy and are considered a relatively high level of evidence, mainly because there is a comparison group. It is quite usual to match these women with respect to maternal demographics, such as smoking and alcohol consumption and produce a group that is as similar as possible in all demographics, apart from exposure to the drug being studied. For example, in Teratology Information Services studies, pregnancy outcomes of interest are collected and compared to one or two groups of other women who are either (1) exposed to different antidepressants (in an attempt to control for depression) or (2) neither depressed nor taking an antidepressant, called the non-teratogen group, consisting of women who called the service regarding an innocuous exposure. The outcomes of interest are then compared between groups, the primary one often being major malformations. These studies are relatively simple and require no comprehensive knowledge of statistics, as the results are usually reported as p values and Odds Ratios (ORs) with occasional logistic regression. These studies are conducted by the individual service or in collaboration with other services in the world. The main strength of these studies is a personal interview with the individual, which includes detailed history taking and whether she actually took the drug, the dose, and in which trimester of pregnancy. It is also a prospective approach, as when a woman is enrolled in the study when she is in early pregnancy, so the outcome is unknown, preventing recall bias.

The major limitation of these types of studies is the sample size. For example, examining 100 patients is small for statistical purposes as it only has an 80 % power to detect a fourfold increase in major malformations over the baseline rate, since major malformations occur in 1–3 % of all pregnancies, whether the women took any medications or not.

Approximately 800 cases in each group would be required to detect a twofold increase in risk of relatively common malformations, and with most of the study sample sizes around 200–250 cases, thousands more cases would be required to detect rare malformations. Other limitations include samples that are not randomly selected and women calling a teratogen information service generally have a higher socioeconomic status (SES) who may not reflect the general population (Chambers et al. 1996; Einarson et al. 2012a; Sivojelezova et al. 2005).

3.2.4 Case–Control Studies

These are retrospective studies where the outcome is known and the group with a given outcome (e.g., major malformation) is compared to another group who did not have that outcome with respect to the exposure of interest. This methodology is often used in teratology studies because far fewer cases are required to examine rare birth defects, compared to prospective comparative cohorts. The main limitation is the retrospective bias and the inability to match for all confounders. Some important studies have been published regarding use of antidepressants during pregnancy using this model which will be discussed later.

3.2.5 Meta-analysis

This can be a very useful method when studying drug use in pregnancy; as discussed previously, most observational pregnancy outcome studies have small sample sizes. Meta-analysis is a way of combining results across different studies, enlarging the sample size, so as to make a more definitive statement regarding safety/risk of the drug. Usually a minimum of at least two reviewers independently search for studies published in the literature on the drug of interest. A literature search is conducted by these two individuals, using all available databases. Both case–control and cohort studies can be accepted for analysis, as well as abstracts presented at scientific meetings, as long as the subjects were similar. The inclusion and exclusion process is then carried out by the two reviewers, who independently evaluate the articles for acceptance into the study. If necessary, a third reviewer may act as an adjudicator for any unresolved disputes. The reviewers then extract the data from the included studies into 2 × 2 tables and the data are analyzed. The first meta-analysis carried out using pregnancy outcome data and birth defects was conducted regarding Bendectin® (doxylamine/pyridoxine), a drug used for the treatment of nausea and vomiting of pregnancy (NVP). This study was prompted by several lawsuits against the company alleging that this drug was a teratogen. It was estimated that at one time 30 million women were exposed to this drug in the USA and 25 epidemiologic studies had been performed regarding its safety during pregnancy. Following the combination of all of these studies, with more than 17,000 women, there was no evidence of an increased risk for major malformations above the population based 1–3 % (Einarson et al. 1988).

3.3 Administrative Databases

Databases are not typically set up for pharmacoepidemiologic research as they are primarily developed for various administrative claims payments. For this reason, important data are often missing, especially for studies of drug use and pregnancy outcomes. However, they often contain data from large numbers of individuals with important information, so have been increasingly used in research, most frequently to conduct post-marketing surveillance. Many studies that have produced positive results are from prescription databases with samples compiled using data from prescriptions that have been filled by the patient. The main strength of this method is the very large sample sizes. However, there are several limitations, the main one being that it is only known if a person filled the prescription, not that the drug was actually taken (Källén et al. 2011). Due to the extremely large sample sizes, there can be a potential for false positives, as statistical significance is a function of sample size. That is, the larger the sample size, the higher is the probability of chance findings of statistically significant results that may have no clinical relevance. In addition, results are usually given as an Odds Ratio (OR) and the baseline rate is rarely documented. For example, it is not very meaningful if, after comparing the results of two groups, there is a statistically significant OR, and if the risk in both groups is below the baseline rate of what is expected in the population. In one publication, researchers examined the quality of 100 abstracts from published studies that had examined the safety of various drugs in pregnancy. In 94 % of the results, when a significant OR was reported, baseline rates were not documented (Einarson et al. 2006), and without knowing what rates are expected in the population, it is difficult to evaluate whether there really is an increased risk. For example, in a study reporting results from a prescription data base examining rates of spontaneous abortions (SA) comparing two groups of women, one group taking a nonsteroidal anti-inflammatory drug (NSAID) and one group not, it was documented that the OR was 2, which meant that women taking an NSAID had twice the risk of having an SA. However, there was no record of the actual rates of SA in the paper, and only after contacting the author, the actual rates of SA were identified in the unexposed group. Considering that rates of SA are believed to be approximately 10–15 % it was evident that in both groups, rates were considerably below what is expected in the population, so the clinical relevance of this study was definitely questionable (Nielsen et al. 2001). Recently, a study with women enrolled prospectively and outcomes verified with medical records did not find an increased risk for use of NSAIDs associated with an increased risk for SA (Edwards et al. 2012).

3.3.1 National Birth Registries

Some European countries operate government-supported registries where data from the mother and child pairs are entered after birth and are followed up prospectively. For example, many studies have been published with data from The Swedish Birth Registry (Källén and Otterblad 2006). The Hungarian Case-Control Surveillance System of Congenital Abnormalities is another group in this category who have published many pregnancy outcome studies following exposure during pregnancy (Kazy et al. 2007).

When practicing evidence-based medicine, all of these methodologies loosely fit into the category Level 2 “Evidence obtained from well-designed cohort or case–control analytic studies, preferably from more than one center or research group” (Elstein 2004).

3.4 Evaluating Studies

Critical appraisal is not difficult if one knows what needs careful examination in each individual study. A relatively new guideline which has been introduced to help design, conduct, and report observational studies and is currently required by some high impact journals, is the STROBE statement which was developed to strengthen the reporting of observational studies in Epidemiology (von Elm et al. 2007). In journals in which this is a requirement, a questionnaire has to be completed and submitted to the journal together with the manuscript. Although this checklist is primarily for authors, it can be an extremely useful tool for the reader, as the authors are required to check items that should be included in the study which are compulsory. The STROBE checklist covers each section of the study and is available online at:

3.5 Understanding the Results of the Statistical Analysis

Following a close examination of the study design using the STROBE guidelines, it is now time to examine how the statistical analysis was conducted and how it was reported. The analysis of studies can vary greatly and the results even more so, most often depending on the sample sizes. The larger the sample size, the easier it is to conduct more statistical tests to find significance, just because it can be done, which does not necessarily make the results of the particular study more important than a smaller one. A large sample size is considered to be able to produce more robust results; however, sometimes the sample size can be so large that even the smallest differences can become statistically significant with no clinical relevance. For example, women should not smoke cigarettes during pregnancy and many studies have been published reporting significant differences in birth weight in women who smoke. Smaller sample sizes such as the Motherisk studies with about 200 cases did not find any differences in birth weight (Einarson et al. 2001; Gallo et al. 2000); however, huge sample sizes with thousands of women did find statistical differences, albeit with often less than 100-g difference in birth weight between the smokers and nonsmokers (Conter et al. 1995). When considering a full-term infant is born weighing on average 3,500 g, one wonders if this really has any real clinical relevance. However, in this example, because women are advised to stop smoking in pregnancy and rightfully so, much is made of this clinically insignificant result, because hopefully, if a women believes she is harming her baby, this will give her more incentive to quit.

3.5.1 Statistical Tests

An Odds Ratio (OR) is a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. Odds ratios are most commonly used in case–control studies; however, they can also be used in cohort study designs as well. The 95 % confidence interval (CI) is used to estimate the precision of the OR. A large CI indicates a low level of precision of the OR, whereas a small CI indicates a higher precision of the OR. It is important to note, however, that unlike the pvalue, the 95 % CI does not directly report a measure’s statistical significance, but it does provide comparable information. In practice, the 95 % CI is often used as a proxy for the presence of statistical significance if it does not contain the null value (e.g., OR = 1).

Relative Risk (RR) is the risk of an adverse event (or of developing a disease) in an exposed group relative to a nonexposed group. It is used in cohort studies; however, the study must be large enough to accurately measure the outcome in the control group (i.e., the baseline risk). As with the OR, a RR of 2 means that the probability of the outcome is twice as high in the exposed as in the nonexposed group.

Logistic Regression (LR) is used when one wants to produce odds ratios while controlling for the possible influence of other factors. For example, one could examine whether smoking status during pregnancy was related to the development of depression while controlling for maternal age, body mass index, and previous episodes of depression. The outcome variable must have two categories. Observational cohorts, such as Motherisk studies, rarely do LR as the women are already matched on these variables, which is an alternate approach to control for these variables (Einarson et al. 2012a; Sivojelezova et al. 2005).

Understanding these basic statistical tests is a good foundation to be able to evaluate most studies examining safety/risk of drugs in pregnancy.

3.6 Evaluation of the Abstract

It is likely, for a variety of reasons, that many clinicians read only the abstract of a paper in a scientific journal. Therefore, it is very important that abstracts contain as much information about the study as possible, especially the results and conclusions. Most journals have reduced the number of words in their abstracts from 300–350 to 200–250 maximum and some do not include an introduction, simply an objective, the study design, results, and conclusions. In a previously discussed study (Einarson et al. 2006), details frequently absent were baseline risk (94 %), drug dose (91 %), nonsignificant p-values (72 %), significant p-values (57 %), confounders (69 %), and risk difference (48 %). Two examples of why one should not read only the abstract and were misunderstood are as follows: the first study (case control) was one where the authors examined whether taking an antidepressant in pregnancy was associated with an increased risk of Persistent Pulmonary Hypertension in the Newborn (PPHN). 14 infants with PPHN who had been exposed to an SSRI as compared with 6 control infants revealed an OR 6.1 (95 % confidence interval, 2.2–16.8), which appears to be a really large risk. However, in the conclusion of the main text, these results were clearly put into perspective; “on the assumption that the relative risk of 6.1 is true and that the relationship is causal, the absolute risk for PPHN in their infants among women who use SSRIs in late pregnancy is relatively low (about 6–12 per 1,000), put in other terms, about 99 % of these women will deliver a baby unaffected by PPHN” (Chambers et al. 2006). This study caused a great deal of angst among both pregnant women and their healthcare providers, especially because it was published in the New England Journal of Medicine. The second (an observational cohort) was to determine the association of maternal psychotropic medication use during pregnancy with preterm delivery and other adverse perinatal outcomes using a cohort of 2,793 pregnant women. In the abstract, the authors reported that the maternal use of benzodiazepines during pregnancy was associated with an increased risk of preterm delivery (adjusted odds ratio, 6.79; 95 % confidence interval, 4.01–11.5) and an increased risk of low birth weight, low Apgar score, higher neonatal intensive care unit admissions, and respiratory distress syndrome. The authors’ conclusion was that benzodiazepine use in pregnancy was associated highly with preterm delivery and other adverse perinatal outcomes. However, when reading the complete paper, their conclusions simply did not match the results for varying reasons. The reporting in the abstract suggested that the entire cohort were psychotropic medication users; in the text the sample size of psychotropic medication users was 10.7 % (300/2,793) of their cohort. They reported that benzodiazepines are highly associated with an increased risk of preterm delivery, despite the fact that the sample size was only N = 85. In addition, decreasing the sample size further, hydroxyzine, an antihistamine, was listed as a psychotropic drug (n = 107), so the real sample of psychotropic drug exposures was only (N = 193), or 6.9 % of 2,793 women. Consequently, this sample size was too small to make a definitive conclusion, as the authors had stated in the conclusion (Calderon-Margalit et al. 2009). That paper was also published in a highly respected obstetrics and gynecology journal, which behooves readers to not allow the high impact of the journal to affect their critical examination of the manuscript.

3.7 Questioning the Validity of the Results

When evaluating the literature, one should always examine if the results appear to be valid, based on the authors’ conclusions. In one example, the authors reported an increase risk for major malformation (RR 1.84) in women taking antidepressants. The validity of these results was questionable for the following reasons: (1) there was no pattern of specific defects, (2) there was no separation of major versus minor malformations, (3) as this was a prescription events monitoring study, it was not known whether the medications were actually taken, and (4) psychiatrically ill patients frequently use other psychotropic medications, alcohol, and illicit drugs and these potential confounders were not addressed (Wogelius et al. 2006). In another study, the authors conducted a large number of tests, but made no adjustment for multiple testing, without acknowledging that their results could all be random error. They also attempted to identify depressed untreated pregnant women, but provided no solid evidence that they actually succeeded in doing so. They also found two very trivial differences in birth weights (30 g difference between groups) and stated they had found an increased risk for low birth weight (Oberlander et al. 2006). In another study (meta-analysis) the authors pooled the results of 12 studies that examined poor neonatal adaptation syndrome PNAS. There was a significant association between exposure to antidepressants during pregnancy and overall occurrence of poor neonatal adaptation syndrome PNAS (OR = 5.07; 95 % CI, 3.25–7.90; p < 0.0001). It is well known that that infants can suffer from PNAS (up to 30 %) information derived from different published reports as stated by the authors. However, in neither the abstract or main text, there was no baseline frequency of occurrence, so the reader still has no idea of how often this syndrome actually occurs (Grigoriadis et al. 2013). Finally, in another study (meta-analysis) the authors concluded that the summary estimate indicated an increased prevalence of combined cardiac defects with first trimester paroxetine use. However, the authors opted to exclude the Motherisk study n = 1,174 cases for trivial reasons that reported no increased risk for cardiovascular defects, which likely would have lowered the OR (Einarson and Koren 2010; Wurst et al. 2010).

3.8 Selective Bias in Reviews of the Literature

Bias is inherent in all research, whether or not the researchers are even aware and good research is conducted so as to control for any obvious bias in the data. Recently, a review was published regarding the use of antidepressants in pregnancy and the authors’ conclusion was “Antidepressant use during pregnancy is associated with increased risks of miscarriage, birth defects, preterm birth, newborn behavioral syndrome, persistent pulmonary hypertension of the newborn and possible longer term neurobehavioral effects. There is no evidence of improved pregnancy outcomes with antidepressant use.” Unfortunately for readers, the authors selected only papers that reported adverse outcomes following exposure to antidepressants and did not include any that found no association with adverse outcomes. This was not a systematic review, there were no ORs stated in the abstract, and it is the most blatant example of selection bias (Domar et al. 2013). In this case, the authors had no basis for their conclusions because they did not examine all of the literature and their conclusions should be disregarded. Again, this study was published in a respected journal and received a great deal of media attention, which again caused a great deal of anxiety for pregnant women taking antidepressant and their healthcare providers.

In contrast, with no apparent bias a national pregnancy registry recently published a study in which 10,511 infants of women who had used SSRI drugs but no other central nervous system-active drug, 1,000 infants born of women who had used benzodiazepines and no other CNS-active drug, and 406 infants whose mothers had used both SSRI and benzodiazepines but no other CNS-active drug were compared to a nonexposed group. Their conclusions were as follows: none of the three groups showed a higher risk for any relatively severe malformation or any cardiac defect when comparison was made with the general population risk (adjusted RR for the combination of SSRI and benzodiazepines and a relatively severe malformation = 1.17; 95 % CI, 0.70–1.73). Similar results were obtained for the combination of SSRI with other sedative/hypnotic drugs. Conclusions: “The previously stated increased risk associated with the combined use of these drug categories, notably for a cardiac defect, could not be replicated” (Reis and Källén 2013). Therefore, all available studies should be examined before making a conclusion.

3.9 Does Research Design Affect Results?

To date, there have been more than 30,000 pregnancy outcomes of infants exposed to antidepressants during pregnancy, published in studies where the researchers used different methodologies and reported different conclusions although odds ratios were rarely >2.

A study (meta-analysis) was conducted to examine the possibility that the seemingly conflicting results may be different because of the variety of models that were used.

Designs that were compared included prospective cohort, retrospective cohort, and case–control studies. Rates of major malformations and cardiac malformations were combined by study type using random effects meta-analytic models. Overall ORs for major malformations ranged from 1.03 to 1.24 and 0.81 to 1.32 for cardiac malformations. The authors discovered that diverse observational models with differing strengths and weaknesses produced remarkably similar nonsignificant results. Perceived conflicting results may be due to subsequent dissemination of results with attention given to small statistically differences with negligible clinical relevance. The authors concluded that a combination of methods is appropriate to produce evidence-based information on the safety of drugs during pregnancy, provided the study design is rigorous and the limitations are stated clearly (Einarson et al. 2012b).

3.10 Conclusion

Understanding simple statistics and evaluating published studies are critical, as all study designs have limitations and authors do not always fully disclose details. It should not be assumed that high impact journals, renowned authors, and prestigious institutions automatically publish high-quality research. Application of results requires careful interpretation, most importantly, to consider when confronted with marginally increased ORs to examine whether the results have any real clinical significance.

This process is extremely important in the Knowledge Transfer Translation (KT) process. This is observational research, and consequently, there are some deficiencies in study design and analysis among all of the studies. However, this does not mean that the information provided from the results of these studies is not valuable, as long as the methodology and analyses are critically evaluated. It is unlikely that in the near future, pregnant women will be included in randomized controlled trials, so this reinforces the need to improve the rigor of the available study methods.

The apparent conflicting evidence regarding the results of the antidepressant studies are not conflicting after all, as it appears likely due to the way selected results were interpreted and disseminated. Finally and of great importance, improved knowledge transfer and translation will ensure that pregnant women with mental health disorders and their healthcare providers receive the most accurate evidence-based information, for decision making regarding the use of psychotropic drugs during pregnancy.


Calderon-Margalit R, Qiu C, Ornoy A, Siscovick DS, Williams MA. Risk of preterm delivery and other adverse perinatal outcomes in relation to maternal use of psychotropic medications during pregnancy. Am J Obstet Gynecol. 2009;201(6):579.PubMedCentralPubMed

Chambers CD, Johnson KA, Dick LM, Felix RJ, Jones KL. Birth outcomes in pregnant women taking fluoxetine. N Engl J Med. 1996;335(14):1010–5.PubMedCrossRef

Chambers CD, Hernandez-Diaz S, Van Marter LJ, Werler MM, Louik C, Jones KL, et al. Selective serotonin-reuptake inhibitors and risk of persistent pulmonary hypertension of the newborn. N Engl J Med. 2006;354(6):579–87.PubMedCrossRef

Conter V, Cortinovis I, Rogari P, Riva L. Weight growth in infants born to mothers who smoked during pregnancy. BMJ. 1995;310(6982):768.PubMedCentralPubMedCrossRef

Domar AD, Moragianni VA, Ryley DA, Urato AC. The risks of selective serotonin reuptake inhibitor use in infertile women: a review of the impact on fertility, pregnancy, neonatal health and beyond. Hum Reprod. 2013;28(1):160–71.PubMedCrossRef

Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991;337(8746):867–72.PubMedCrossRef

Edwards DRV, Aldridge T, Baird DD, Funk MJ, Savitz DA, Hartmann KE. Periconceptional over-the-counter nonsteroidal anti-inflammatory drug exposure and risk for spontaneous abortion. Obstet Gynecol. 2012;120(1):113–22.PubMedCrossRef

Einarson A. Antidepressants and pregnancy: complexities of producing evidence-based information. CMAJ. 2010;182(10):1017–8.PubMedCentralPubMedCrossRef

Einarson A, Koren G. First trimester exposure to paroxetine and prevalence of cardiac defects: meta‐analysis of the literature: unfortunately incomplete. Birth Defects Res A Clin Mol Teratol. 2010;88(7):588.PubMedCrossRef

Einarson TR, Leeder JS, Koren G. A method for meta-analysis of epidemiological studies. Drug Intell Clin Pharm. 1988;22(10):813–24.PubMed

Einarson A, Fatoye B, Sarkar M, Lavigne SV, Brochu J, Chambers C, et al. Pregnancy outcome following gestational exposure to venlafaxine: a multicenter prospective controlled study. Am J Psychiatry. 2001;158(10):1728–30.PubMedCrossRef

Einarson TR, Lee C, Smith R, Manley J, Perstin J, Loniewska M, et al. Quality and content of abstracts in papers reporting about drug exposures during pregnancy. Birth Defects Res A Clin Mol Teratol. 2006;76(8):621–8.PubMedCrossRef

Einarson A, Smart K, Vial T, Diav-Citrin O, Yates L, Stephens S, et al. Rates of major malformations in infants following exposure to duloxetine during pregnancy: a preliminary report. J Clin Psychiatry. 2012a;73(11):1471.PubMedCrossRef

Einarson TR, Kennedy D, Einarson A. Do findings differ across research design? The case of antidepressant use in pregnancy and malformations. J Popul Ther Clin Pharmacol. 2012b;19(2):e334–48.PubMed

Elstein AS. On the origins and development of evidence-based medicine and medical decision making. Inflamm Res. 2004;53(S2):S184–9.PubMedCrossRef

Gallo M, Sarkar M, Au W, Pietrzak K, Comas B, Smith M, et al. Pregnancy outcome following gestational exposure to echinacea: a prospective controlled study. Arch Intern Med. 2000;160(20):3141.PubMedCrossRef

Gluud LL. Bias in clinical intervention research. Am J Epidemiol. 2006;163(6):493–501.PubMedCrossRef

Grigoriadis S, Vonderporten EH, Mamisashvili L, Eady A, Tomlinson G, Dennis CL, et al. The effect of prenatal antidepressant exposure on neonatal adaptation: a systematic review and meta-analysis. J Clin Psychiatry. 2013;74(4):e309–20.PubMedCrossRef

Källén BA, Otterblad OP. Use of oral decongestants during pregnancy and delivery outcome. Am J Obstet Gynecol. 2006;194(2):480–5.PubMedCrossRef

Källén B, Nilsson E, Olausson PO. Antidepressant use during pregnancy: comparison of data obtained from a prescription register and from antenatal care records. Eur J Clin Pharmacol. 2011;67(8):839–45.PubMedCrossRef

Kazy Z, Puhó EH, Czeizel AE. Effect of doxycycline treatment during pregnancy for birth outcomes. Reprod Toxicol. 2007;24(3):279–80.PubMedCrossRef

McBride WG. Letters to the editor: thalidomide and congenital abnormalities. Lancet. 1961;278(7216):1358.CrossRef

Nielsen GL, Sørensen HT, Larsen H, Pedersen L. Risk of adverse birth outcome and miscarriage in pregnant users of non-steroidal anti-inflammatory drugs: population based observational study and case-control study. BMJ. 2001;322(7281):266–70.PubMedCentralPubMedCrossRef

Oberlander TF, Warburton W, Misri S, Aghajanian J, Hertzman C. Neonatal outcomes after prenatal exposure to selective serotonin reuptake inhibitor antidepressants and maternal depression using population-based linked health data. Arch Gen Psychiatry. 2006;63(8):898–906.PubMedCrossRef

Pochi PE, Ceilley RI, Coskey RJ, Drake LA, Jansen GT, Rodman OG, et al. Guidelines for prescribing isotretinoin (Accutane) in the treatment of female acne patients of childbearing potential. Acne Subgroup, Task Force on Standards of Care. J Am Acad Dermatol. 1988;19(5):920.PubMedCrossRef

Reis M, Källén B. Combined use of selective serotonin reuptake inhibitors and sedatives/hypnotics during pregnancy: risk of relatively severe congenital malformations or cardiac defects. A register study. BMJ Open. 2013;3(2):e002166.PubMedCentralPubMedCrossRef

Sivojelezova A, Shuhaiber S, Sarkissian L, Einarson A, Koren G. Citalopram use in pregnancy: prospective comparative evaluation of pregnancy and fetal outcome. Am J Obstet Gynecol. 2005;193(6):2004–9.PubMedCrossRef

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7.CrossRef

Wogelius P, Nørgaard M, Gislum M, Pedersen L, Munk E, Mortensen PB, et al. Maternal use of selective serotonin reuptake inhibitors and risk of congenital malformations. Epidemiology. 2006;17(6):701–4.PubMedCrossRef

Wurst KE, Poole C, Ephross SA, Olshan AF. First trimester paroxetine use and the prevalence of congenital, specifically cardiac, defects: a meta‐analysis of epidemiological studies. Birth Defects Res A Clin Mol Teratol. 2010;88(3):159–70.PubMedCrossRef

Yaris F, Kadioglu M, Kesim M, Ulku C, Yaris E, Kalyoncu NI, et al. Newer antidepressants in pregnancy: prospective outcome of a case series. Reprod Toxicol. 2004;19(2):235–8.PubMedCrossRef

If you find an error or have any questions, please email us at Thank you!