Automated Audiometry

Number: 0870

Table Of Contents

Applicable CPT / HCPCS / ICD-10 Codes


Scope of Policy

This Clinical Policy Bulletin addresses automated audiometry.

Experimental and Investigational

Aetna considers automated audiometry that is either self-administered or administrated by non-audiologists experimental and investigational because its effectiveness has not been adequately validated to be equivalent to audiometry performed by an audiologist.


CPT Codes / HCPCS Codes / ICD-10 Codes

Code Code Description

Information in the [brackets] below has been added for clarification purposes.   Codes requiring a 7th character are represented by "+":

CPT codes not covered for indications listed in the CPB:

0208T Pure tone audiometry (threshold), automated; air only [without an audiologist]
0209T     air and bone [without an audiologist]


A limited number of studies have compared computer-assisted audiometry that is self-administered or administered by non-audiologists to audiometry administered by an audiologist. 

Mahomed et al (2013) conducted a meta-analysis of studies reporting within-subject comparisons of manual and automated threshold audiometry.  The authors found overall average differences between manual and automated air conduction audiometry to be comparable with test-retest differences for manual and automated audiometry.  The authors found, however, limited data on automated audiometry in children and difficult-to-test populations, automated bone conduction audiometry, and data on the performance of automated audiometry in different types and degrees of hearing loss.

The American Speeh-Language Hearing Association (2013) recommends that hearing screening be conducted under the supervision of an audiologist holding the ASHA Certificate of Clinical Competence (CCC).

In a prospective diagnostic study, Foulad et al (2103) determined the feasibility of an Apple iOS-based automated hearing testing application and compared its accuracy with conventional audiometry.  An iOS-based software application was developed to perform automated pure-tone hearing testing on the iPhone, iPod touch, and iPad.  To assess for device variations and compatibility, preliminary work was performed to compare the standardized sound output (dB) of various Apple device and headset combinations.  A total of 42 subjects underwent automated iOS-based hearing testing in a sound booth, automated iOS-based hearing testing in a quiet room, and conventional manual audiometry.  The maximum difference in sound intensity between various Apple device and headset combinations was 4 dB.  On average, 96 % (95 % confidence interval [CI]: 91 % to 100 %) of the threshold values obtained using the automated test in a sound booth were within 10 dB of the corresponding threshold values obtained using conventional audiometry.  When the automated test was performed in a quiet room, 94 % (95 % CI: 87 % to 100 %) of the threshold values were within 10 dB of the threshold values obtained using conventional audiometry.  Under standardized testing conditions, 90 % of the subjects preferred iOS-based audiometry as opposed to conventional audiometry.  The authors concluded that Apple iOS-based devices provided a platform for automated air conduction audiometry without requiring extra equipment and yielded hearing test results that approach those of conventional audiometry.  This was a feasibility study; its findings need to be validated by well-designed studies.

Khoza-Shangase and Kassner (2013) determined the accuracy of UHear™, a downloadable audiometer on to an iPod Touch©, when compared with conventional audiometry.  Participants were primary school students.  A total number of 86 participants (172 ears) were included.  Of these 86 participants, 44 were females and 42 were males; with the age ranging from 8 years to 10 years (mean age of 9.0 years). Each participant underwent 2 audiological screening evaluations; one by means of conventional audiometry and the other by means of UHear™.  Otoscopy and tympanometry was performed on each participant to determine status of their outer and middle ear before each participant undergoing pure tone air conduction screening by means of conventional audiometer and UHear™.  The lowest audible hearing thresholds from each participant were obtained at conventional frequencies.  Using the paired t-test, it was determined that there was a significant statistical difference between hearing screening thresholds obtained from conventional audiometry and UHear™.  The screening thresholds obtained from UHear™ were significantly elevated (worse) in comparison to conventional audiometry.  The difference in thresholds may be attributed to differences in transducers used, ambient noise levels and lack of calibration of UHear™.  The authors concluded that the UHear™ is not as accurate as conventional audiometry in determining hearing thresholds during screening of school-aged children.  Moreover, they stated that caution needs to be exercised when using such measures and research evidence needs to be established before they can be endorsed and used with the general public.

In a Cochrane review, Barker et al (2014) stated that acquired adult-onset hearing loss is a common long-term condition for which the most common intervention is hearing aid fitting. However, up to 40 % of people fitted with a hearing aid either fail to use it or may not gain optimal benefit from it. These investigators evaluated the long-term effectiveness of interventions to promote the use of hearing aids in adults with acquired hearing loss fitted with at least 1 hearing aid. The authors concluded that there is some low to very low quality evidence to support the use of self-management support and complex interventions combining self-management support and delivery system design in adult auditory rehabilitation. However, effect sizes were small and the range of interventions that had been tested was relatively limited.

In a 2-phase correlational study, Convery et al (2015) evaluated the reliability and validity of an automatic audiometry algorithm that is fully implemented in a wearable hearing aid, to determine to what extent reliability and validity are affected when the procedure is self-directed by the user, and to investigate contributors to a successful outcome. A total of 60 adults with mild-to-moderately severe hearing loss participated in both studies: 20 in Study 1 and 40 in Study 2; 27 participants in Study 2 attended with a partner. Participants in both phases were selected for inclusion if their thresholds were within the output limitations of the test device. In both phases, participants performed automatic audiometry through a receiver-in-canal, behind-the-ear hearing aid coupled to an open dome. In Study 1, the experimenter directed the task. In Study 2, participants followed a set of written, illustrated instructions to perform automatic audiometry independently of the experimenter, with optional assistance from a lay partner. Standardized measures of hearing aid self-efficacy, locus of control, cognitive function, health literacy, and manual dexterity were administered. Statistical analysis examined the repeatability of automatic audiometry; the match between automatically and manually measured thresholds; and contributors to successful, independent completion of the automatic audiometry procedure. When the procedure was directed by an audiologist, automatic audiometry yielded reliable and valid thresholds. Reliability and validity were negatively affected when the procedure was self-directed by the user, but the results were still clinically acceptable: test-retest correspondence was 10 dB or lower in 97 % of cases, and 91 % of automatic thresholds were within 10 dB of their manual counterparts. However, only 58 % of participants were able to achieve a complete audiogram in both ears. Cognitive function significantly influenced accurate and independent performance of the automatic audiometry procedure; accuracy was further affected by locus of control and level of education. Several characteristics of the automatic audiometry algorithm played an additional role in the outcome. The authors concluded that average transducer- and coupling-specific correction factors are sufficient for a self-directed in-situ audiometry procedure to yield clinically reliable and valid hearing thresholds. Before implementation in a self-fitting hearing aid, however, the algorithm and test instructions should be refined in an effort to increase the proportion of users who are able to achieve complete audiometric results. They stated that further evaluation of the procedure, particularly among populations likely to form the primary audience of a self-fitting hearing aid, should be undertaken.

Levit and colleagues (2015) estimated the rate of hearing loss detected by first-stage oto-acoustic emissions test but missed by second -stage automated auditory brainstem response (ABR) testing. The data of 17,078 infants who were born at Lis Maternity Hospital between January 2013 and June 2014 were reviewed.  Infants who failed screening with a transient evoked oto-acoustic emissions (TEOAE) test and infants admitted to the NICU for more than 5 days underwent screening with an automated ABR test at 45 decibel hearing level (dB HL).  All infants who failed screening with TEOAE were referred to a follow-up evaluation at the hearing clinic.  A total of 24 % of the infants who failed the TEOAE and passed the automated ABR hearing screening tests were eventually diagnosed with hearing loss by diagnostic ABR testing (22/90).  They comprised 52 % of all of the infants in the birth cohort who were diagnosed with permanent or persistent hearing loss 0.25 dB HL in 1 or both ears (22/42).  Hearing loss 0.45 dB HL, which is considered to be in the range of moderate-to-profound severity, was diagnosed in 36 % of the infants in this group (8/22), comprising 42 % of the infants with hearing loss of this degree (8/19).  The authors concluded that the sensitivity of the diverse response detection methods of automated ABR devices needs to be further empirically evaluated.

Brennan-Jones and associates (2016) examined the accuracy of automated audiometry in a clinically heterogeneous population of adults using the KUDUwave automated audiometer. Manual audiometry was performed in a sound-treated room and automated audiometry was not conducted in a sound-treated environment.  A total of 42 consecutively recruited participants from a tertiary otolaryngology department in Western Australia.  Absolute mean differences ranged between 5.12 to 9.68 dB (air-conduction) and 8.26 to 15 dB (bone-conduction).  A total of 86.5 % of manual and automated 4FAs were within 10 dB (i.e., ±5 dB); 94.8 % were within 15 dB.  However, there were significant (p < 0.05) differences between automated and manual audiometry at 250, 500, 1,000, and 2,000 Hz (air-conduction) and 500 and 1,000 Hz (bone-conduction).  The effect of age (greater than or equal to 55 years) on accuracy (p = 0.014) was not significant on linear regression (p > 0.05; R(2) = 0.11).  The presence of a hearing loss (better ear greater than or equal to 26 dB) did not significantly affect accuracy (p = 0.604; air-conduction), (p = 0.218; bone-conduction).  The authors concluded that the findings of this study provided clinical validation of automated audiometry using the KUDUwave in a clinically heterogeneous population, without the use of a sound-treated environment.  They stated that while threshold variations were statistically significant, future research is needed to ascertain the clinical significance of such variation.

In a pilot study, Brennan-Jones and colleagues (2017) examined the diagnostic accuracy of automated audiometry in adults with hearing loss in an asynchronous tele-health model using pre-defined diagnostic protocols. These researchers recruited 42 study participants from a public audiology and otolaryngology clinic in Perth, Western Australia.  Manual audiometry was performed by an audiologist either before or after automated audiometry.  Diagnostic protocols were applied asynchronously for normal hearing, disabling hearing loss, conductive hearing loss and unilateral hearing loss.  Sensitivity and specificity analyses were conducted using a 2-by-2 matrix and Cohen's kappa was used to measure agreement.  The overall sensitivity for the diagnostic criteria was 0.88 (range of 0.86 to 1) and overall specificity was 0.93 (range of 0.86 to 0.97).  Overall kappa (k) agreement was "substantial" k = 0.80 (95 % CI: 0.70 to 0.89) and significant at p < 0.001.  The authors concluded that pre-defined diagnostic protocols applied asynchronously to automated audiometry provide accurate identification of disabling, conductive and unilateral hearing loss.  They stated that this method has the potential to improve synchronous and asynchronous tele-audiology service delivery.

In a prospective, cross-over, equivalence study, Whitton and associates (2016) compared hearing measurements made at home using self-administered audiometric software against audiological tests performed on the same subjects in a clinical setting. In experiment 1, adults with varying degrees of hearing loss (n = 19) performed air-conduction audiometry, frequency discrimination, and speech recognition in noise testing twice at home with an automated tablet application and twice in sound-treated clinical booths with an audiologist.  The accuracy and reliability of computer-guided home hearing tests were compared to audiologist administered tests.  In experiment 2, the reliability and accuracy of pure-tone audiometric results were examined in a separate cohort across a variety of clinical settings (n = 21).  Remote, automated audiograms were statistically equivalent to manual, clinic-based testing from 500 to 8,000 Hz (p ≤ 0.02); however, 250 Hz thresholds were elevated when collected at home.  Remote and sound-treated booth testing of frequency discrimination and speech recognition thresholds were equivalent (p ≤ 5 × 10-5 ).  In the second experiment, remote testing was equivalent to manual sound-booth testing from 500 to 8,000 Hz (p ≤ 0.02) for a different cohort who received clinic-based testing in a variety of settings.  The authors concluded that these data provided a proof of concept that several self-administered, automated hearing measurements are statistically equivalent to manual measurements made by an audiologist in the clinic.  The demonstration of statistical equivalency for these basic behavioral hearing tests points toward the eventual feasibility of monitoring progressive or fluctuant hearing disorders outside of the clinic to increase the efficiency of clinical information collection.

Masalski and colleagues (2016) noted that hearing tests performed in the home setting by means of mobile devices require previous calibration of the reference sound level.  Mobile devices with bundled headphones create a possibility of applying the pre-defined level for a particular model as an alternative to calibrating each device separately.  These investigators determined the reference sound level for sets composed of a mobile device and bundled headphones.  Reference sound levels for Android-based mobile devices were determined using an open access mobile phone application by means of biological calibration, i.e., in relation to the normal-hearing threshold.  The examinations were conducted in 2 groups:
  1. an uncontrolled, and
  2. a controlled one.

In the uncontrolled group, the fully automated self-measurements were performed in home conditions by 18- to 35-year old subjects, without prior hearing problems, recruited online.  Calibration was conducted as a preliminary step in preparation for further examination.  In the controlled group, audiologist-assisted examinations were performed in a sound booth, on normal-hearing subjects verified through pure-tone audiometry, recruited offline from among the workers and patients of the clinic.  In both the groups, the reference sound levels were determined on a subject's mobile device using the Bekesy audiometry.  The reference sound levels were compared between the groups.  Intra-model and inter-model analyses were performed as well.  In the uncontrolled group, 8,988 calibrations were conducted on 8,620 different devices representing 2,040 models.  In the controlled group, 158 calibrations (test and re-test) were conducted on 79 devices representing 50 models.  Result analysis was performed for 10 most frequently used models in both the groups.  The difference in reference sound levels between uncontrolled and controlled groups was 1.50 dB (SD 4.42).  The mean SD of the reference sound level determined for devices within the same model was 4.03 dB (95 % CI: 3.93 to 4.11).  Statistically significant differences were found across models.  The authors concluded that reference sound levels determined in the uncontrolled group were comparable to the values obtained in the controlled group.  This validated the use of biological calibration in the uncontrolled group for determining the pre-defined reference sound level for new devices.  Moreover, due to a relatively small deviation of the reference sound level for devices of the same model, it was feasible to conduct hearing screening on devices calibrated with the pre-defined reference sound level.  Moreover, these researchers stated that the method presented in this study could be applied in screening hearing examinations on a large scale with the use of popular mobile devices sold with bundled headphones.  Due to rapidly growing market of mobile devices, the main advantage of the method is the semi-automated calibration of new models.  Pre-defined reference sound level for a new model may be determined on the basis of a biological calibration conducted by the first users of devices.  They stated that to confirm the estimated accuracy of the method, it is advisable to conduct a direct comparison of pure-tone audiometry and a hearing test on mobile devices calibrated biologically by means of the pre-defined reference sound level.

In a prospective study, Saliba and colleagues (2017)
  1. compared the accuracy of 2 previously validated mobile-based hearing tests in determining pure tone thresholds and screening for hearing loss, and
  2. determined the accuracy of mobile audiometry in noisy environments through noise reduction strategies.

A total of 33 adults with or without hearing loss were tested (mean age of 49.7 years; women, 42.4 %).  Air conduction thresholds measured as pure tone average and at individual frequencies were assessed by conventional audiogram and by 2 audiometric applications (consumer and professional) on a tablet device.  Mobile audiometry was performed in a quiet sound booth and in a noisy sound booth (50 dB of background noise) through active and passive noise reduction strategies.  On average, 91.1 % (95 % CI: 89.1 % to 93.2 %) and 95.8 % (95 % CI: 93.5 % to 97.1 %) of the threshold values obtained in a quiet sound booth with the consumer and professional applications, respectively, were within 10 dB of the corresponding audiogram thresholds, as compared with 86.5 % (95 % CI: 82.6 % to 88.5 %) and 91.3 % (95 % CI: 88.5 % to 92.8 %) in a noisy sound booth through noise cancellation.  When screening for at least moderate hearing loss (pure tone average greater than 40 dB HL), the consumer application showed a sensitivity and specificity of 87.5 % and 95.9 %, respectively, and the professional application, 100 % and 95.9 %.  Overall, patients preferred mobile audiometry over conventional audiograms.  The authors concluded that mobile audiometry could correctly estimate pure tone thresholds and screen for moderate hearing loss; noise reduction strategies in mobile audiometry provided a portable effective solution for hearing assessments outside clinical settings.  This was a small (n = 33) study; its findings need to be validated by well-designed studies.

Furthermore, UpToDate reviews on "Evaluation of hearing loss in adults" (Weber, 2017) and "Hearing impairment in children: Evaluation" (Smith and Gooi, 2017) do not mention automated audiometry as a management tool.

Brennan-Jones and co-workers (2018) stated that remote interpretation of automated audiometry offers the potential to enable asynchronous tele-audiology assessment and diagnosis in areas where synchronous tele-audiometry may not be possible or practical.  These researchers compared remote interpretation of manual and automated audiometry.  A total of 5 audiologists each interpreted manual and automated audiograms obtained from 42 patients.  The main outcome variable was the audiologist's recommendation for patient management (which included treatment recommendations, referral or discharge) between the manual and automated audiometry test.  Cohen's Kappa and Krippendorff's Alpha were used to calculate and quantify the intra- and inter-observer agreement, respectively, and McNemar's test was used to assess the audiologist-rated accuracy of audiograms.  Audiograms were randomized and audiologists were blinded as to whether they were interpreting a manual or automated audiogram.  Intra-observer agreement was substantial for management outcomes when comparing interpretations for manual and automated audiograms.  Inter-observer agreement was moderate between clinicians for determining management decisions when interpreting both manual and automated audiograms.  Audiologists were 2.8 times more likely to question the accuracy of an automated audiogram compared to a manual audiogram.  The authors concluded that there is a lack of agreement between audiologists when interpreting audiograms, whether recorded with automated or manual audiometry.  The main variability in remote audiogram interpretation was likely to be individual clinician variation, rather than automation.

Govender and colleagues (2018) noted that asynchronous automated telehealth-based hearing screening and diagnostic testing can be used within the rural school context to identify and confirm hearing loss.  These investigators evaluated the efficacy of an asynchronous telehealth-based service delivery model using automated technology for screening and diagnostic testing as well as to describe the prevalence, type and degree of hearing loss.  A comparative within-subject design was used.  Frequency distributions, sensitivity, specificity scores as well as the positive and negative predictive values (PPV and NPV) were calculated.  Testing was conducted in a non-sound-treated classroom within a school environment on 73 participants (146 ears).  The sensitivity and specificity rates were 65.2 % and 100 %, respectively.  Diagnostic accuracy was 91.7 % and the NPV and PPV were 93.8 % and 100 %, respectively.  Results revealed that 23 ears of 20 participants (16 %) presented with hearing loss; 12 % of ears presented with unilateral hearing impairment and 4 % with bilateral hearing loss.  Mild hearing loss was identified as most prevalent (8 % of ears); 8 ears obtained false-negative results and presented with mild low- to mid-frequency hearing loss.  The sensitivity rate for the study was low and was attributed to plausible reasons relating to test accuracy, child-related variables and mild low-frequency sensory-neural hearing loss.  The authors concluded that the findings of this study demonstrated that asynchronous telehealth-based automated hearing testing within the school context could be used to facilitate early identification of hearing loss; however, further research and development into protocol formulation, ongoing device monitoring and facilitator training is needed to improve test sensitivity and ensure accuracy of results.

Shojaeemend and Ayatollahi (2018) reviewed studies related to automated audiometry by focusing on the implementation of an audiometer, the use of transducers and evaluation methods.  This review study was carried out in 2017.  The papers related to the design and implementation of automated audiometry were searched in the following databases: Science Direct, Web of Science, PubMed, and Scopus.  The time frame for the papers was between January 1, 2010 and August 31, 2017.  A total of 143 papers were found, and after screening, the number of papers was reduced to 16.  The findings showed that the implementation methods were categorized into the use of software (7 papers), hardware (3 papers) and smartphones/tablets (6 papers).  The used transducers were a variety of earphones and bone vibrators.  Different evaluation methods were used to evaluate the accuracy and the reliability of the diagnoses.  However, in most studies, no significant difference was found between automated and traditional audiometry.  The authors concluded that automated audiometry produced clinically acceptable results compared with traditional audiometry.  The 2 main advantages of automated audiometry are saving costs and improving accessibility to hearing care, which can lead to a cost-effective and rapid diagnosis of hearing impairment, especially in poor areas.  The use of automated audiometry may have some challenges, such as measuring the impact of environmental noise on the test results, recording bone-conduction hearing thresholds with the possibility of generating occlusion effects by the earphones, and ensuring the quality of the automated audiometry test results.  These researchers stated that further studies need to be conducted to compare the characteristics of different computerized solutions and related challenges for automated audiometry.  Because the performance of transducers are different, evaluation studies are needed to compare their performance to be able to choose the best one for automated audiometry.

The authors stated that this study had several drawbacks.  Due to the limitation of smartphones in generating different audio frequencies and intensities, these applications could only be used for general screening programs when traditional audiometry tests are not available.  Another limitation was about sound calibration.  Unlike an audiometer, the output sound of smartphones is not calibrated, and it may not meet the requirements of audiometry.  Moreover, the hardware of smartphones and audiometers is different, and the accuracy of the results should be examined.  These researchers stated that more studies are needed to identify the strengths and limitations of computerized solutions for automated audiometry to be able to design more effective solutions in the future.

Pereira and associates (2018) noted that very few studies have examined if tablet-based automated audiometry could offer a valid alternative to traditional manual audiometry for estimation of hearing thresholds in children.  This study examined the validity and efficiency of automated audiometry in school-aged children.  Hearing thresholds for 0.5, 1, 2, 4, 6, and 8 kHz were collected in 32 children aged 6 to 12 years using standard audiometry and tablet-based automated audiometry in a sound-proof booth.  Test administration time, test preference, and medical history were also collected.  Results exhibited that the majority (67 %) of threshold differences between automated and standard were within the clinically acceptable range (10 dB).  The threshold difference between the 2 tests showed that automated audiometry thresholds were higher by 12 dB in 6-year olds, 7 dB in 7- to 9-year olds, and 3 dB in 10- to 12-year olds.  In addition, test administration times were similar, such that standard audiometry took an average of 12.3 mins and automated audiometry took 11.9 mins.  The authors concluded that these results supported the use of tablet-based automated audiometry in children from ages 7 to 12 years.  However, the results suggested that the clinical use of at least some types of tablet-based automated audiometry may not be feasible in children 6 years of age.

Samelli and colleagues (2020) examined the performance of a tablet-based tele-audiometry method for automated hearing screening of schoolchildren through a comparison of the results of various hearing screening approaches.  A total of 244 children were evaluated; tablet-based screening results were compared with gold-standard pure-tone audiometry.  Acoustic immittance measurements were also conducted.  To pass the tablet-based screening, the children were required to respond to at least 2 out of 3 sounds for all the frequencies in each ear.  Several hearing screening methods were analyzed: exclusively tablet-based (with and without 500-Hz checked) and combined tests (series and parallel).  The sensitivity, specificity, PPV, NPV and accuracy were calculated.  A total of 9.43 % of children presented with mild-to-moderate conductive hearing loss (unilateral or bilateral).  Diagnostic values varied among the different hearing screening approaches that were evaluated: sensitivities ranged from 60 to 95 %, specificities ranged from 44 to 91 %, PPVs ranged from 15 to 44 %, NPVs ranged from 95 to 99 %, accuracy values ranged from 49 to 88 %, and area under curve (AUC) values ranged from 0.690 to 0.883.  Regarding diagnostic values, the highest results were found for the tablet-based screening method and for the series approach.  Compared with the results obtained by conventional audiometry and considering the diagnostic values of the different hearing screening approaches, the highest diagnostic values were generally obtained using the automated hearing screening method (including 500-Hz).  The authors concluded that this application, which was developed for the tablet computer, was shown to be a valuable hearing screening tool for use with schoolchildren.  These researchers suggested that this hearing screening protocol has the potential to improve asynchronous tele-audiology service delivery.

Colsman and colleagues (2020) noted that quantifying hearing thresholds via mobile self-assessment audiometric applications has been demonstrated repeatedly with heterogenous results regarding the accuracy.  One important limitation of several of these applications has been the lack of appropriate calibration of their core technical components (sound generator and headphones).  These researchers examined the accuracy and reliability of a calibrated application (app) for pure-tone screening audiometry by self-assessment on a tablet computer: Audimatch app installed on Apple iPad 4 in combination with Sennheiser HDA-280 headphones.  In a repeated-measures design audiometric thresholds collected by the app were compared to those obtained by standardized automated audiometry and additionally test-retest reliability was evaluated.  A total of 68 subjects aged 19 to 65 years with normal hearing were tested in a sound-attenuating booth.  An equivalence test revealed highly similar hearing thresholds for the app compared with standardized automated audiometry.  A test-retest reliability analysis within each method showed a high correlation coefficient for the app (Spearman rank correlation: rho = 0.829) and for the automated audiometer (rho = 0.792).  The results implied that the self-assessment of audiometric thresholds via a calibrated mobile device represented a valid and reliable alternative for stationary assessment of hearing loss thresholds, supporting the potential usability within the area of occupational health care.

The authors stated that this study had several drawbacks.  Test sessions were conducted in a sound-insulated booth; thus, it was not evident whether results could be compared to the measurement of hearing thresholds in a noisy surrounding, like in a standard office.  Therefore, field studies with environmental noise could provide more insight on the accuracy and validity of the audiometric thresholds gathered by the app (e.g., in a waiting room of an otolaryngologist).  More importantly, a validation with audiologically impaired patients would be necessary for the estimation of sensitivity and specificity of the app for clinically relevant hearing loss.  Furthermore, the audiometric application was designed for self-assessment.  However, even though the whole audiometric procedure can be performed by the subject, the system is not intended for the use in private homes, outside the range of a trained person, as special headphones are needed and regular calibration of the iPad/headphone combination is a prerequisite.  Apart from the calibration of the system, which has to be performed by a specialized company, this audiometric screening test could be operated by the user.  Some supervision by trained personnel is helpful when starting the app, but it did not require the guidance of a health care professional.  This also stood in contrast to the operation of the automated audiometer (which was used for comparison) for which the placement of the headphones and the instruction of the subjects had to be carried out by a trained person.  A further drawback of the study concerned the sampling method.  Gender and age range are well-known factors that influence hearing thresholds.  To avoid biases due to an over-proportionate representation of these specific attributes, these investigators used a sampling method, which allowed to collect data from a more representative sample than simple random sampling.  This method accepted the consequence that the recruiting was not completely random.  The proportion of men and women in the sample was balanced, and age was uniformly sampled across the age range of the study.  In addition, 2 authors of the current article were involved in the development of the app-based mobile hearing test that was evaluated in this study.  This fact was disclosed before the start of the study so that a potential influence on the design of the study, data collection or the rational of data analysis could be contained beforehand.

Charih and associates (2020) stated that recent mobile and automated audiometry technologies have allowed for the democratization of hearing healthcare and enables non-experts to deliver hearing tests.  The problem remains that a large number of such users are not trained to interpret audiograms.  These investigators outlined the development of a data-driven audiogram classification system designed specifically for the purpose of concisely describing audiograms.  More specifically, they presented how a training data-set was assembled and the development of the classification system leveraging supervised learning techniques.  These researchers showed that 3 practicing audiologists had high intra- and inter-rater agreement over audiogram classification tasks pertaining to audiogram configuration, symmetry and severity.  The system proposed here achieved a performance comparable to the state of the art, but is significantly more flexible.  The authors concluded that this work laid a solid foundation for future work aiming to apply machine learning techniques to audiology for audiogram interpretation.

The authors stated that this study had several drawbacks.  Due to the logistical complexity and cost of acquiring audiogram annotations, these investigators were only able to assemble a data-set of 270 distinct audiograms annotated by 3 separate audiologists.  While these researchers did ensure that their audiologists were trained in different schools of audiology and practiced audiology with different subpopulations, it was likely that their estimate of inter-rater reliability could be made more accurate by adding additional raters.  In fact, hiring more audiologists and collecting more audiograms would likely further increase our confidence that these findings could be generalized.  Specifically, adding more raters is likely to increase inter-rater reliability (but not intra-rater reliability, which is reflection of the inherent difficulty of the task).  Unfortunately, augmenting their data-set is extremely costly, as the professional services of multiple audiologists are needed.  If large public data-sets, such as the NHANES, were to include diagnostic outcome, then this would enable larger-scale studies in the future.  A second main drawback worth mentioning was that the classification system presented here could not classify audiograms by site of lesion, while AMCLASS can.  Obtaining labels for this descriptor of hearing loss was impossible because the unlabeled NHANES data used in this study did not contain masked or unmasked bone conduction thresholds.  Finally, while a step in the right direction, the NHANES data-set used in this study did not comprise the data needed to extend the algorithm such that it could identify a potential diagnosis or the appropriate professional to whom the patient should be referred.  These researchers stated that future work will aim to collect more data and to examine the integration of additional sources of data such as medical history, patient age, bone conduction thresholds, questionnaire data, otoscopic images, and tympanogram data.  The ultimate objective is to extend the scope of this system, such that it not only describes the audiogram, but also provides a proposed differential diagnosis.  Additionally, the system could eventually provide recommendations with respect to referral and therapeutic options.  Another avenue involves assessing the generalizability of this system, although this will entail labeling additional audiograms to validate the Data-Driven Annotation Engine (DDAE) against.  Finally, when undertaking this project, these investigators sought to examine if machine learning can accomplish the same audiogram classification tasks normally completed by a professional audiologist.  Future studies should examine additional novel applications of machine learning in the field of audiology, beyond automating the state of the art.  However, adoption of such innovations may require a change in the practice of audiology itself and are beyond the scope of the present study.

Bean and colleagues (2022) noted that up to 80 % of audiograms could be automated that would allow more time for provision of specialty services.  Ideally, automated audiometers would provide accurate results for listeners with impaired hearing as well as normal hearing.  Furthermore, accurate results should be provided both in controlled environments like a sound-attenuating room but also in test environments that may support greater application when sound-attenuating rooms are unavailable.  Otokiosk is an iPhone operating system (iOS)-based system that has been available for clinical use; however, there are not yet any published validation studies using this product.  These researchers completed a validation study on the OtoKiosk automated audiometry system in quiet and in low-level noise, for listeners with normal hearing and for listeners with impaired hearing.  Pure tone air conduction thresholds were obtained for each subject for 3 randomized conditions: standard audiometry, automated testing in quiet, and automated testing in noise.  Noise, when present, was 35 dBA overall and was designed to emulate an empty medical examination room.  Subjects consisted of 11 adults with hearing loss and 15 adults with normal hearing recruited from the local area.  Thresholds were measured at 500, 1,000, 2,000, and 4,000 Hz using the OtoKiosk system that incorporates a modified Hughson-Westlake method.  Results were analyzed using descriptive statistics and also by a linear mixed-effects model to compare thresholds obtained in each condition.  Across condition and subject group 73.6 % of thresholds measured with OtoKiosk were within ± 5 dB of the conventionally measured thresholds; 92.8 % were within ± 10 dB.  On average, differences between tests were small.  Pair-wise comparisons revealed thresholds were approximately 3.5 to 4.0 dB better with conventional audiometry than with the mobile application in quiet and in noise.  Noise did not affect thresholds measured with OtoKiosk.  The authors concluded that the OtoKiosk automated hearing test measured pure tone air conduction thresholds from 500 to 4,000 Hz at slightly higher thresholds than conventional audiometry, but less than the smallest typical 5 dB clinical step-size.  These researchers stated that these findings suggested that the OtoKiosk automated audiometry system is a reasonable solution for sound booths and examination rooms with low-level background noise. These researchers stated that further investigations of additional clinical application are needed.

The authors stated that despite the evidence from this study that OtoKiosk could be a potentially viable tool for evaluating air conduction thresholds for listeners with normal and impaired hearing, further investigation is needed to fully establish the benefits and limitations of this system.  For example, in the current study, only 4 frequencies were tested.  Future work is needed to establish the accuracy of the automated system in higher (e.g., 8,000 Hz) and lower (e.g., 250 Hz) test frequencies, where variability would be expected to be greater based on the work of others.  These test frequencies were not included in this study given the demonstrated variability, although they could be clinically useful.  Furthermore, more research is needed to determine the extent to which these findings would generalize to age groups beyond 47 to 76 years.  It is likely that accuracy might decline for pediatric or geriatric patients.  The primary objective of this trial was to compare pure tone thresholds obtained using the automated system to pure tone thresholds obtained using manual techniques.  Future investigation to better understand the approximately 3.5 dB difference may include obtaining thresholds using the different methods with the same transducers.  Finally, all testing in this study was accomplished in a sound-attenuating booth, with limited distractions.  It was possible thresholds acquired by automated or other methods would be susceptible to distractions that might be in other environments, such as examination rooms with open doors or busy clinic waiting areas.

Home-Based Audiometry

Hazan et al (2022) examined the test-retest reliability of a smartphone-based hearing test, carried out without supervision of a hearing professional in an uncontrolled environment.  The hearing application is based on an automated hearing test (DuoTone) and relies on verification procedures (ambient noise monitoring algorithm, graphical user interface) to ensure appropriate measurement conditions.  Thresholds obtained with DuoTone were compared to those obtained with standard clinical audiometry for 0.5, 1, 2, and 4 kHz in 13 subjects.  Subsequently, test-retest reliability was analyzed using anonymized cloud-stored data from a large group of app users (1,641 subjects) who carried out multiple hearing tests.  Thresholds at minimum or maximum presentation level of the hearing test (10 dB HL, 85 dB HL) were excluded to avoid floor/ceiling effects.  A subset (500 subjects) was created to exclude potentially unreliable data.  Test-retest thresholds were compared at 12 test frequencies, from 125 Hz to 12 kHz.  Thresholds determined by DuoTone and clinical audiometry did not differ significantly for each test frequency.  Regarding test-retest analysis, the percentage of test-retest results within 5 dB ranged from 60 % to 77 % per test frequency.  Results from the subset were not substantially different.  Test-retest reliability for app users was comparable to results published in the literature regarding test-retest reliability of audiometry, performed in the clinic.  The authors concluded that the JHC (Jacoti Hearing Center) app, based on the DuoTone procedure with ambient noise monitoring, provided reliable hearing thresholds between 15 dB HL and 80 dB HL with a test-retest variability comparable to audiometry conducted in a clinical setting.  As such, the JHC app might enable remote audiometry as a 1st step in a diagnostic process (e.g., to plan an appropriate diagnostic test protocol) or as a follow-up measure, as well as in hearing loss compensation with hearing devices (e.g., to program over-the-counter hearing aids and smartphone-based hearing aid apps).  These researchers stated that provided careful data selection is employed to mitigate effects of confounding factors resulting from the uncontrolled nature of the data collection process; cloud-based data analysis offers unique opportunities to analyze large real-world data volumes.

The authors stated that this study had several drawbacks.  Initial comparison of hearing thresholds obtained with either JHC or clinical audiometry (Experiment 1) had 2 limitations: First, it was based on an initial dataset of 13 subjects (22 ears); and second, it was restricted to 4 frequencies in the speech region.  Extensions of this work by including more subjects and more tested frequencies (i.e., from 125 to 12,000 Hz) would allow for a more robust and precise comparison.  Concerning the test-retest analysis, the relatively novel method of cloud-based data collection offers a unique opportunity for analysis of large volumes of real-world data.  However, there was no information regarding other “use conditions” typically managed in a controlled test design, such as the level of user motivation, distraction, understanding of the test procedure, as well as compliance to the instructions.  Furthermore, although the JHC app requires a confirmation that each test was carried out by the same user, there was no definitive confirmation that this was the case.  Similarly, users of JHC were reminded that the application was only supported when used in combination with wired Apple EarPods (for which the application was calibrated).  However, verifying which transducer was used was not technically possible.  Such limitations could be alleviated in future studies by implementing automated user and transducer identification technologies in the JHC app.  Moreover, the uncontrolled nature of the data collection may have introduced confounding effects; these were mitigated in Experiment 2 by careful data selection.  Data selection itself may also have introduced bias, although it was solely performed to select suitable, analyzable data (e.g., data at the same frequency from 2 consecutive tests) and to avoid confounding factors (e.g., floor and ceiling effects).  While exclusion of potentially unreliable data did not have a substantial effect on the results, future studies may deploy more sophisticated data selection and/or re-balancing methods.

Liu et al (2022) stated that automated pure-tone audiometry has been shown to provide similar hearing threshold estimates to conventional audiometry; however, lower correlations were reported at high and low frequencies (HF and LF) in audiometric tests than those of manual tests, while the correlations were better in the middle frequencies.  These investigators employed the same equipment and different test procedures for automated testing; and compared the results with manual test results.  A total of 100 subjects aged 18 to 36 years were randomly divided into 2 groups to perform air-conduction pure-tone audiometry (0.25, 0.5, 1, 2, 4, 8 kHz) using the ascending and shortened ascending protocols built-in to the automated audiometer, respectively.  Recorded testing time, the total number of responses and the subject's preference tests were compared with those of manual tests.  Significant difference was found at 250 Hz regarding the distribution of the absolute difference between the 2 automated and the manual thresholds.  The testing time spent in the ascending method (9.8 ± 1.4 mins, mean ± SD) was significantly longer than in the shorted ascending method (5.8 ± 0.9 mins).  The total numbers of responses of the ascending method (90.5 ± 10.8 times) and shorted ascending method (62.0 ± 11.4 times) were significantly different.  Finally, no significant difference was observed in preferences between automated and manual procedures.  The authors concluded that in normal hearing subjects, there is a high correlation between automated and manual audiometry thresholds; however, the variation was higher at 8,000 Hz.  The test time was shorter using the shortened ascending method than the ascending method, but the accuracy of the 2 automated procedures differed statistically at 250 Hz.  A more delicate threshold-seeking, the ascending procedure, may address this problem when testing low frequencies.

The authors stated that this study had several drawbacks.  First, the testing sequence of the manual and automated methods was not counter-balanced, the manual testing was carried out first, which could cause an order effect.  Second, all subjects in this study were with normal hearing, and the correlation between the results of the shortened/ascending and manual method was good, but further research is needed to determine whether the correlation is still accepted when the automated hearing test was performed in individuals with various degrees of hearing loss.  Third, all the pure tone audiometry tests in this study were carried out in the sound booth, further investigations need to be conducted to compare the hearing thresholds of subjects with different degrees of hearing loss in a non-isolated environment.

In a population-based study, Hoff et al (2023) examined the accuracy of automated audiometry in the elderly, and assessed the influence of test frequency, age, sex, hearing and cognitive status.  This study included 2 age-homogeneous samples of 70-year-old (n = 238) and 85-year-old (n = 114) individuals who were tested with automated audiometry in an office using circum-aural headphones and, around 4 weeks later, with manual audiometry conducted to clinical standards.  The differences were analyzed for individual frequencies (range of 0.25 kHz to 8 kHz) and pure-tone averages.  The mean difference (MD) varied across test frequencies and age groups, the overall figure being -0.7 dB (SD = 8.8, p < 0.001), and 68 % to 94 % of automated thresholds corresponded within ±10 dB of manual thresholds.  The poorest accuracy was found at 8 kHz.  Age, sex, hearing and cognitive status were not associated with the accuracy (ordinal regression analysis).  The authors concluded that automated audiometry appeared to produce accurate assessments of hearing sensitivity in the majority of the elderly, but with larger error margins than in younger populations, and was not affected by relevant patient factors associated with old age.  Moreover, these researchers stated that future studies should examine the test-retest repeatability of automated audiometry in the elderly; and aim at identifying factors that may improve the accuracy of test results.

The authors stated that one drawback of this study was that manual pure-tone audiometry was treated as the gold standard, and that any deviations from it were interpreted as inaccuracies in the automated test method.  In reality, it has not been shown that manual pure-tone audiometry is more accurate, and there is in fact no easy way to know which method is the most accurate.  In addition, the results of this trial may be limited to the specific audiometer and test protocol employed.  Furthermore, these researchers only evaluated air conduction pure-tone thresholds (PTTs).  Bone conduction testing is a vital element of audiological diagnostics, and automated versions have been validated.  However, bone conduction testing is more complex to conduct in a reliable way, due to difficulties in placing the vibrator on the bone and higher susceptibility to errors caused by background noise.  These issues may perhaps be even more difficult when testing the elderly.

Blankenship et al (2023) noted that reliable wireless automated audiometry that includes extended HF (EHF) outside a sound booth would increase access to monitoring programs for individuals at risk for hearing loss, especially those at risk for ototoxicity.  In a cross-sectional, repeated measures study, these researchers compared thresholds obtained with: First -- standard manual audiometry to automated thresholds measured with the Wireless Automated Hearing Test System (WAHTS) inside a sound booth; and Second -- automated audiometry in the sound booth to automated audiometry outside the sound booth in an office environment.  A total of 28 typically developing children and adolescents (mean age of 14.6 years; range of 10 to 18).  Audiometric thresholds were measured from 0.25 to 16 kHz with manual audiometry in the sound booth, automated audiometry in the sound booth, and automated audiometry in a typical office environment in counter-balanced order.  Ambient noise levels were measured inside the sound booth and the office environment were compared to thresholds at each test frequency.  Automated thresholds were overall about 5 dB better compared to manual thresholds, with greater differences in the extended high frequency range (EHF; 10 to 16 kHz).  The majority of automated thresholds measured in a quiet office were within ± 10 dB of automated thresholds measured in a sound booth (84 %), while only 56 % of automated thresholds in the sound booth were within ± 10 dB of manual thresholds.  No relationship was found between automated thresholds measured in the office environment and the average or maximum ambient noise level.  The authors concluded that these findings indicated that self-administered, automated audiometry resulted in slightly better thresholds overall than manually administered audiometry in children, consistent with previous studies in adults.  Ambient noise levels in a typical office environment did not have an adverse effect on audiometric thresholds measured using noise attenuation headphones.  Thresholds measured using an automated tablet with noise attenuating headphones could improve access to hearing assessment for children with a variety of risk factors.  Moreover, these researchers stated that additional studies of extended high frequency automated audiometry in a wider age range are needed to establish normative thresholds.


The above policy is based on the following references:

  1. American Speech-Language-Hearing Association (ASHA). Hearing screening and testing. Information for the Public. Rockville, MD: ASHA; 2013.
  2. Barker F, Mackenzie E, Elliott L, et al. Interventions to improve hearing aid use in adult auditory rehabilitation. Cochrane Database Syst Rev. 2014;7:CD010342
  3. Bean BN, Roberts RA, Picou EM, et al. Automated audiometry in quiet and simulated exam room noise for listeners with normal hearing and impaired hearing. J Am Acad Audiol. 2022;33(1):6-13. 
  4. Blankenship CM, Hickson LM, Quigley T, et al. Extended high-frequency audiometry using the wireless automated hearing test system compared to manual audiometry in children and adolescents. medRxiv. 2023 May 23;2023.05.22.23290339. [Preprint]
  5. Brennan-Jones CG, Eikelboom RH, Bennett RJ, et al. Asynchronous interpretation of manual and automated audiometry: Agreement and reliability. J Telemed Telecare. 2018;24(1):37-43.
  6. Brennan-Jones CG, Eikelboom RH, Swanepoel de W, et al. Clinical validation of automated audiometry with continuous noise-monitoring in a clinically heterogeneous population outside a sound-treated environment. Int J Audiol. 2016;55(9):507-513.
  7. Brennan-Jones CG, Eikelboom RH, Swanepoel W. Diagnosis of hearing loss using automated audiometry in an asynchronous telehealth model: A pilot accuracy study. J Telemed Telecare. 2017;23(2):256-262.
  8. Charih F, Bromwich M, Mark AE, et al. Data-driven audiogram classification for mobile audiometry. Sci Rep. 2020;10(1):3962.
  9. Colsman A, Supp GG, Neumann J, Schneider TR. Evaluation of accuracy and reliability of a mobile screening audiometer in normal hearing adults. Front Psychol. 2020;11:744.
  10. Convery E, Keidser G, Seeto M, et al. Factors affecting reliability and validity of self-directed automatic in situ audiometry: Implications for self-fitting hearing AIDS. J Am Acad Audiol. 2015;26(1):5-18.
  11. Foulad A, Bui P, Djalilian H. Automated audiometry using apple iOS-based application technology. Otolaryngol Head Neck Surg. 2013;149(5):700-706.
  12. Govender SM, Mars M. Assessing the efficacy of asynchronous telehealth-based hearing screening and diagnostic services using automated audiometry in a rural South African school. S Afr J Commun Disord. 2018;65(1):e1-e9.
  13. Hazan A, Luberadzka J, Rivilla J, et al. Home-based audiometry with a smartphone app: Reliable eesults? Am J Audiol. 2022;31(3S):914-922.
  14. Ho AT, Hildreth AJ, Lindsey L. Computer-assisted audiometry versus manual audiometry. Otol Neurotol. 2009;30(7):876-883.
  15. Hoff M, Gothberg H, Tengstrand T, et al. Accuracy of automated pure-tone audiometry in population-based samples of older adults. Int J Audiol. 2023 Jun 19 [Online ahead of print].
  16. Khoza-Shangase K, Kassner L. Automated screening audiometry in the digital age: Exploring Uhear and its use in a resource-stricken developing country. Int J Technol Assess Health Care. 2013;29(1):42-47.
  17. Levit Y, Himmelfarb M, Dollberg S. Sensitivity of the automated auditory brainstem response in neonatal hearing screening. Pediatrics. 2015;136(3):e641-e647.
  18. Liu H, Du B, Liu B, et al. Clinical comparison of two automated audiometry procedures. Front Neurosci. 2022;16:1011016.
  19. Mahomed F, Swanepoel DW, Eikelboom RH, Soer M. Validity of automated threshold audiometry: A systematic review and meta-analysis. Ear Hear. 2013;34(6):745-752.
  20. Margolis RH, Glasberg BR, Creeke S, Moore BC. AMTAS: Automated method for testing auditory sensitivity: Validation studies. Int J Audiol. 2010;49(3):185-194.
  21. Masalski M, Kipinski L, Grysinski T, Krecicki T. Hearing tests on mobile devices: Evaluation of the reference sound level by means of biological calibration. J Med Internet Res. 2016;18(5):e130
  22. Pereira O, Pasko LE, Supinski J, et al. Is there a clinical application for tablet-based automated audiometry in children? Int J Pediatr Otorhinolaryngol. 2018;110:87-92.
  23. Saliba J, Al-Reefi M, Carriere JS, et al. Accuracy of mobile-based audiometry in the evaluation of hearing loss in quiet and noisy environments. Otolaryngol Head Neck Surg. 2017;156(4):706-711.
  24. Samelli AG, Rabelo CM, Sanches SGG, et al. Tablet-based tele-audiometry: Automated hearing screening for schoolchildren. J Telemed Telecare. 2020;26(3):140-149. 
  25. Shojaeemend H, Ayatollahi H. Automated audiometry: A review of the implementation and evaluation methods. Healthc Inform Res. 2018;24(4):263-275.
  26. Smith RJH, Gooi A. Hearing impairment in children: Evaluation. UpToDate [online serial]. Waltham, MA: UpToDate; reviewed July 2017.
  27. Swanepoel de W, Mngemane S, Molemong S, et al. Hearing assessment-reliability, accuracy, and efficiency of automated audiometry. Telemed J E Health. 2010;16(5):557-563.
  28. Weber PC. Evaluation of hearing loss in adults. UpToDate [online serial]. Waltham, MA: UpToDate; reviewed July 2017.
  29. Whitton JP, Hancock KE, Shannon JM, Polley DB. Validation of a self-administered audiometry application: An equivalence study. Laryngoscope. 2016;126(10):2382-2388.
  30. Yu J, Ostevik A, Hodgetts B, Ho A. Automated hearing tests: Applying the otogram to patients who are difficult to test. J Otolaryngol Head Neck Surg. 2011;40(5):376-383.