Abstract This is the first report of a full genome scan of sexual orientation in men. A
sample of 456 individuals from 146 families with two or more gay brothers was
genotyped with 403 microsatellite markers at 10-cM intervals. Given that previously
reported evidence of maternal loading of transmission of sexual orientation could indicate
epigenetic factors acting on autosomal genes, maximum likelihood estimations (mlod)
scores were calculated separated for maternal, paternal, and combined transmission. The
highest mlod score was 3.45 at a position near D7S798 in 7q36 with approximately
equivalent maternal and paternal contributions. The second highest mlod score of 1.96
was located near D8S505 in 8p12, again with equal maternal and paternal contributions.
A maternal origin effect was found near marker D10S217 in 10q26, with a mlod score of
1.81 for maternal meioses and no paternal contribution. We did not find linkage to Xq28
in the full sample, but given the previously reported evidence of linkage in this region,
we conducted supplemental analyses to clarify these findings. First, we re-analyzed our
previously reported data and found a mlod of 6.47. We then re-analyzed our current data,
after limiting the sample to those families previously reported, and found a mlod of 1.99.
These Xq28 findings are discussed in detail. The results of this first genome screen for
normal variation in the behavioral trait of sexual orientation in males should encourage
efforts to replicate these findings in new samples with denser linkage maps in the
suggested regions.
Brian S. Mustanski and Michael G. DuPree contributed equally to this work.
Introduction
Although most males report primarily heterosexual attractions, a significant minority
(approximately 2%-6%) of males report predominantly homosexual attractions
(Diamond 1993; Laumann et al. 1994; Wellings et al. 1994). Multiple lines of evidence
suggest that biological factors play a role in explaining individual differences in male
sexual orientation (MIM 306995). For example, the third interstitial nuclei of the human
anterior hypothalamus (INAH3), which is significantly smaller in females, is also
reported to be smaller in homosexual males (LeVay 1991). Byne and colleagues (2001)
followed up on this finding by reporting a trend for INAH3 to occupy a smaller volume
in homosexual men than in heterosexual men, with no significant difference in the
number of neurons within the nucleus. Neuropsychological studies have reported
differences in performance with respect to tasks that show sex differences, such as spatial
processing (e.g., Rahman and Wilson 2003), which may indicate differences in relevant
neural correlates (e.g., parietal cortex). The strong link between adult sexual orientation
and childhood gender-related traits expressed at an early age (Bailey and Zucker 1995)
suggests that such biological influences act early in development, possibly prenatally.
Similarly, the correlation between sexual orientation and a variety of prenatally canalized
anthropometric traits suggests that sexual orientation differentiation probably occurs
before birth (for a review, see Mustanski et al. 2002). Despite this evidence, specific
neurodevelopmental pathways have yet to be elucidated.
Family and twin studies have provided evidence for a genetic component to male sexual
orientation. Family studies, using a variety of ascertainment strategies, document an
elevation in the rate of homosexuality among relatives of homosexual probands (for a
review, see Bailey and Pillard 1995). Several family studies report evidence of increased
maternal transmission of male homosexuality (Hamer et al. 1993; Rice et al. 1999a),
whereas others find no increase relative to paternal transmission (Bailey et al. 1999;
McKnight and Malcolm 2000). Twin studies consistently show that male sexual
orientation is moderately heritable (for a review, see Mustanski et al. 2002). For example,
two recent twin studies in population-based samples both report moderate heritability
estimates, with the remaining variance being explained by nonshared environmental
influences (Kendler et al. 2000; Kirk et al. 2000). The results from family and twin
studies demonstrate that sexual orientation is a complex (i.e., does not show simple
Medelian inheritance) and multifactorial phenotype.
A more limited number of studies have attempted to map specific genes contributing to
variation in sexual orientation. Given the evidence for increased maternal transmission,
initial efforts focused on the X chromosome. One study produced evidence of significant
linkage, based on Lander and Kruglyak (1995) criteria, to markers on Xq28 (Hamer et al.
1993). Another study, from the same laboratory but with a new sample, reported a
significant replication of these findings (Hu et al. 1995). An independent group produced
inconclusive results regarding linkage to Xq28 (discussed in Sanders and Dawood 2003)
but did not publish the findings in a peer-reviewed journal. All three of these studies
excluded families showing evidence for non-maternal transmission. A fourth study from
another independent group found no support for linkage, even when excluding cases with
suggestive father-to-son transmission (Rice et al. 1999b). An analysis of the results across
all four studies produced a statistically suggestive multiple scan probability (MSP) value
of 0.00003 (Sanders and Dawood 2003). Two candidate gene studies have been
conducted, both producing null results: one for the androgen receptor (AR; Macke et al.
1993) and another for aromatase (CYP19A1; Dupree et al. 2004), on Xq12 and 15q21.2,
respectively.
Given the complexity of sexual orientation, numerous genes are likely to be involved,
many of which are expected to be autosomal rather than sex-linked. Indeed, the modest
levels of linkage that have been reported for the X chromosome can account for, at most,
only a fraction of the overall heritability of male sexual orientation as deduced from twin
studies. Therefore, we have undertaken a genomewide linkage scan to aid in the
identification of genes contributing to variation in sexual orientation. As in previous
studies, we diminished the probability of false positives (i.e., gay men who identify as
heterosexual) by only studying self-identified gay men. Unlike previous studies that have
focused solely on the X-chromosome and thus excluded families showing evidence of
non-maternal transmission, this study did not use transmission pattern as an exclusion
criteria. To consider the possibility that previously reported evidence of maternal loading
of transmission of sexual orientation was attributable to epigenetic factors acting on
autosomal genes, we calculated maximum likelihood estimations (mlod) scores separated
by maternal or paternal transmission and the combined statistic. Based on Lander and
Kruglyak s (1995) criteria, we found one region of near significance and two regions
close to the criteria for suggestive linkage.
Materials and methods
Family ascertainment and assessment
The sample consisted of a total of 456 individuals from 146 unrelated families, of which
137 families had two gay brothers and 9 families had three gay brothers. Thirty of the
families included one parent, and 30 of the families included both parents. Additionally,
46 of the families included at least one heterosexual male or female full sibling (up to 6
additional siblings per family). The sample included 40 families previously reported by
Hamer et al. (1993), 33 families previously reported by Hu et al. (1995), and 73
previously unreported families. The 73 previously described families were selected for
the presence of two gay brothers with no indication of non-maternal transmission by the
criteria described previously (Hamer et al. 1993; Hu et al. 1995). For the 73 new families,
the sole inclusion criterion was the presence of at least two self-acknowledged gay male
siblings.
Subjects were recruited through advertisements in local and national homophile
publications as described elsewhere (Hamer et al. 1993; Hu et al. 1995). The participants
were predominantly white (94.5%), college educated (87.4%), and of middle to upper
socioeconomic status. The mean (SD) age for the gay siblings was 36.98 (8.64). The
protocol was approved by the NCI Institutional Review Board, and each participant
signed an informed consent form prior to interview, questionnaire completion, and the
donation of blood for DNA extraction.
Sexual orientation was assessed through a structured interview or a questionnaire that
included a sexual history and the Kinsey scales of sexual attraction, fantasy, behavior,
and self-identification (Kinsey et al. 1948). Each scale ranges from 0 (exclusively
heterosexual) to 6 (exclusively homosexual). The mean (SD) of these four scales for the
gay males in this study was 5.65 (0.46)
Genotyping
DNA was extracted from peripheral blood by a commercial service (Genetic Design,
Greensboro, N.C., USA). A multiplex polymerase chain reaction (PCR) was conducted as
described (Dupree et al. 2004), with 403 microsatellite markers from the ABI PRISM
Linkage Mapping Set Version 2.5 with an average resolution of 10 cM. Following the
manufacturer s guidelines, products were analyzed on an ABI Prism 310 or 3100 and
sized with the GeneScan version 3.1.2 program (PE Biosystems, Foster City, Calif.,
USA), and genotypes were assigned with the Genotyper version 3.6 program (PE
Biosystems). A PCR product from a DNA reference sample (CEPH 1347-02) was used to
monitor sizing conformity (PE Biosystems). Across the 403 markers, genotypes were
ascertained on average for 95% of the 456 individuals. Mendelian incompatibilities
(<0.05% of genotypes) were removed from the data prior to analyses by using the
sib_clean routine from ASPEX version 2.4 (Hinds and Risch 1996). The computer
program CERVUS 2.0 (Marshall et al. 1998) was employed to test for deviation from the
Hardy-Weinberg equilibrium (HW) and to calculate polymorphism information contents
(PICs) at all loci. We found that the markers had a mean (SD) PIC of 0.76 (0.08), and
1.31% of the markers deviated significantly from HW.
Statistical analyses
Nonparametric exclusion mapping of affected sib-pair data (ASP) was performed by
using ASPEX version 2.4 (Hinds and Risch 1996). ASPEX calculates the percentage of
identical by descent (%IBD) sharing and reports the proportion of shared alleles of
paternal, maternal, and combined origin. The results for alleles of combined origin also
include alleles where the parental origin is unknown. We calculated mlod with a linear
model and assuming a multiplicative model. The ASPEX SIB_PHASE algorithm was
applied; this uses allele frequency information to reconstruct and to phase missing
parental information. Sex-specific recombination maps were used for the calculation of
multipoint mlod scores. Marker order and map positions were determined by using an
integrated map (Nievergelt et al. 2004) based on the deCODE genetic map and updated
physical map information.
Results
Results from the multipoint analyses on chromosomes 1 through 22 are shown in Fig. 1
for paternal, maternal, and combined meioses. Our complete genome scan for male
sexual orientation yielded three interesting peaks with mlod scores greater than 1.8,
located on chromosomes 7, 8, and 10. Table 1 contains additional information concerning
these peaks, including the nearest marker, the location, MLOD, and allele sharing.
Additionally, Table 1 contains the approximate boundary of the linkage peak, by
reporting the approximate cM position at which the mlod score declines below 1.0. For
chromosomes 7 and 8, the peak is a result of approximately equal contributions from
maternal and paternal transmission, whereas a maternal-origin effect was found for the
peak on chromosome 10.
Discussion
This study reports results from the first full genome scan for male sexual orientation.
Using 73 previously reported families and 73 new families with two or more gay male
siblings, we found three new regions of genetic interest. Our strongest finding was on
7q36 with a combined mlod score of 3.45 and equal contribution from maternal and
paternal allele transmission. This score falls just short of Lander and Kruglyak s (1995)
criteria for genomewide significance. Several interesting candidate genes map to this
region of chromosome 7. Vasoactive intestinal peptide (VIP) receptor type 2 (VIPR2;
MIM 601970) is a G protein-coupled receptor that activates adenylate cyclase in response
to VIP (Metwali et al. 1996), which functions as a neurotransmitter and as a
neuroendocrine hormone. VIPR2 is essential for the development of the hypothalamic
suprachiasmatic nucleus in mice (Harmar et al. 2002), which makes it an interesting
candidate gene for sexual orientation in view of earlier reports of an enlarged
suprachiasmatic nucleus in homosexual men (Swaab and Hofman 1990). Sonic hedgehog
(SHH; MIM 600725) plays an essential role in patterning the early embryo, including
hemisphere separation (Roessler et al. 1996) and left to right asymmetry (Tsukui et al.
1999). Homosexual men and women show a significant increase in non-righthandedness,
which is related to brain asymmetry (Lalumiere et al. 2000).
Two additional regions approached the criteria for suggestive linkage. The region near
8p12 contains several interesting candidate genes, given the hypothesized relationship
between prenatal hormones and sexual orientation (Mustanski et al. 2002). Gonadotropinreleasing
hormone 1 (GNRH1; MIM 152760) stimulates both the synthesis and release of
luteinizing hormone and follicle-stimulating hormone, which are important regulators of
steroidogenesis in the gonads, and inhibits the release of prolactin (Adelman et al. 1986).
GnRH is synthesized in the arcuate nucleus and other nuclei of the hypothalamus
(Kawakami et al. 1975). Steroidogenic acute regulatory protein (STAR; MIM 600617)
mediates pregnenolone synthesis and is involved in the hypothalamic-pituitary regulation
of adrenal steroid production (Sugawara et al. 1995), which in turn plays an important
role in sexual development. Neuregulin1 (NRG1; MIM 142445) produces a variety of
isoforms that regulate the growth and differentiation of neuronal and glial cells through
interaction with ERBB receptors (Burden and Yarden 1997; Wen et al. 1994).
The 10q26 region is of special interest because it results from excess sharing of maternal
but not paternal alleles. Previous studies have suggested that there is an excess of
homosexual family members related to the proband through the mother, and we have
proposed previously that this might result in part from genomic imprinting (Bocklandt
and Hamer 2003). In support of a connection between 10q26 and imprinting, a germline
differentially methylated region has been identified at this location by Strichman-
Almashanu et al. (2002) who performed a genomewide screen for normally methylated
CpG islands and found 12 regions to be differentially methylated in uniparental tissues of
germline origin, i.e., hydatidiform moles (paternal origin) and complete ovarian
teratomas (maternal origin). Such CpG islands can regulate the expression of imprinted
genes over distances of several hundred kilobases. The region around the 10q26 CpG
islands includes the brain-expressed gene Shadow of Prion Protein (SPRN), several
transcription regulators (ZNF511, VENTX2; MIM 607158), neurotransmitter interacting
proteins (DRD1IP; MIM 604647), and cell signaling pathway proteins (INPP5A; MIM
600106, GPR123).
Four previous linkage studies have been conducted on the X chromosome and together
produce a statistically suggestive MSP in the Xq28 region (Sanders and Dawood 2003).
Because the focus of this study was a full genome scan with the ABI linkage mapping set
on a partially new set of families, we began by reporting results for these markers on the
full sample. This analysis did not produce evidence of linkage in the Xq28 region;
therefore, we conducted supplemental analyses to clarify this result given previous
findings. Our first supplemental analysis combined results from the two previous reports
from our group (Hamer et al. 1993; Hu et al. 1995) in order to determine the magnitude
of the linkage signal in the 73 previously reported families that currently comprised half
of the current sample. This produced a mlod of 6.47. To determine whether the lack of
linkage evidence in the full sample was attributable to the new markers or the additional
families (who were not selected based on family transmission patterns), we then
conducted analyses on the previously reported families by using the markers from the
ABI linkage mapping set. This produced an mlod score of 1.99. Table 2, which provides
a summary of the single point and multipoint results for this comparison, suggests that
that the difference in mlod score between the restricted sample with the old and new
markers is attributable to the non-optimal position and density of the new markers. The
difference in mlod scores between the full sample and the sample restricted to families
without evidence of paternal transmission (with the goal of enriching the sample for
families showing maternal transmission) denotes the possibility of etiologic heterogeneity
for the proposed Xq28 locus.
Several limitations of the current study should be noted. First, we were unable to
calculate empirically derived significance levels for this project because none of the
simulation programs that currently exist allow for the use of sex-specific maps with ASP
data. Future development of simulation programs that allow for the incorporation of this
important information will prevent this limitation in the future. Second, our marker set
had an average resolution of 10 cM, which may have led to underestimated mlod scores.
We discuss in detail above the likely negative effects that this had on our X chromosome
results. Optimally, genome scans are followed up with dense markers placed in promising
regions, but because of financial limitations, we were unable to do this. Future studies
will undoubtedly employ more sophisticated and dense marker sets. Third, we analyzed
only 146 independent families, which is a small sample for a complex trait such as sexual
orientation. Approximately half of these families have previously been included in
reports on the X chromosome (Hamer et al. 1993; Hu et al. 1995). Future research should
be conducted on a new and larger sample of participants. Our linkage results should be
interpreted with consideration of the fact that we only included families with two selfidentified
gay brothers. Our results may not extrapolate to individuals who do not meet
our exclusion criteria, such as individuals who engage in same-sex behavior but do not
identify as gay or individuals who identify as bisexual. The definition of homosexuality is
complicated, and future genetic research would benefit from additional phenotype
development or the identification of endophenotypes for sexual orientation (Mustanski et
al. 2002). The identification of basic processes that underlie sexual orientation could
increase the power of future genetic studies. A related limitation is that we did not
include females in our study because it is not yet clear if female sexual orientation is
determined by the same factors as male sexual orientation (for a discussion, see
Mustanski et al. 2002). Future research with mix-sexed samples should help to answer
this question. Finally, we did not collect data on the number of older brothers, which
shows a robust association with male sexual orientation (Blanchard 2004). Future studies
should collect this data to allow for explorations of gene by environment interactions; this
could increase the ability to identify genetic loci and also help to elucidate the process
linking number of older brothers to sexual orientation.
In summary, we report the first genome scan for loci involved in the complex phenotype
of male sexual orientation. We have also identified several chromosomal regions and
candidate genes for future exploration. The molecular analysis of genes involved in
sexual orientation could greatly advance our understanding of human variation,
evolution, and brain development. In the absence of obvious animal models, genetic
linkage and association studies provide the best opportunity for discovering these loci.
Acknowledgements We thank all the individuals who participated in the project for
their time and openness and Lynn Goldin and Danielle Dick for comments on the
manuscript. B.S.M. was supported by a NSF Graduate Research Fellowship and an NIH
Summer Research Fellowship. N.J.S. and C.M.N. were supported in part by the NHLBI
Family Blood Pressure Program (FBPP; HL64777-01).