Sleep Sciencemeasurement methods

Sleep Trackers: What the Largest Validation Studies Reveal About Accuracy, Orthosomnia Risk, and When They Actually Help

Consumer sleep trackers are used by nearly 40% of adults, but how accurate are they, and can they harm your sleep? This evidence-based guide examines the largest multicenter PSG validation study, the documented psychological risk of orthosomnia, data privacy concerns, and practical guidelines for using trackers for pattern awareness without falling into perfectionism.

intermediate level

SourcesLee et al. 2023, Baron et al. 2017, Frontiers in Psychology 2026, Resmed 2026 Global Sleep Survey, AASM

AuthorEditorial Team

UpdatedJun 18, 2026

Sleep Trackers: What the Largest Validation Studies Reveal About Accuracy, Orthosomnia Risk, and When They Actually Help

Split-composition editorial illustration of a bedroom. Left side shows a restless person looking at a glowing smartphone at night under cool dark blue light. Right side shows the same bedroom with a person sleeping peacefully, a smart ring on their finger and a bedside device, with warm amber morning light through the window. — The same technology that disrupts sleep can also support it — but the line between help and harm depends on accuracy, expectations, and data literacy.

The Sleep Tracker Boom: 39% of Adults Now Monitor Their Sleep

Sleep tracking has moved from a niche hobby for early adopters to a mainstream health behavior in just a few years. The Resmed 2026 Global Sleep Survey, which polled 30,000 respondents across 13 markets, found that 39% of adults now check their sleep via a wearable device at least once per week. That figure represents a dramatic jump from just 16% in 2025. The most common devices are watch-style trackers (58%), followed by fitness bands (36%) and ring trackers (22%).

The financial scale of this shift is equally striking. The global sleep tech devices market was valued at approximately $29.3 billion in 2025, and analysts at Precedence Research project it could reach $153.69 billion by 2030. That kind of growth signals deep consumer appetite for quantified rest — but it also raises a question that the marketing budgets of these companies rarely answer directly: how accurate are these devices, and what happens when people trust them too much?

The survey data hints at the stakes. Among respondents, 62% said they would seek medical advice if their device flagged a potential sleep apnea risk. That suggests many users treat tracker output as clinically meaningful information. Yet the regulatory reality is different: the American Academy of Sleep Medicine (AASM) notes that consumer sleep trackers are sold as entertainment or wellness devices, not medical instruments, and most lack FDA clearance.

What the Largest Multicenter PSG Validation Study Found

To understand how well consumer trackers actually measure sleep, the most useful reference point is a 2023 multicenter study led by Lee and colleagues, published in PMC. This study remains the largest independent head-to-head comparison of consumer sleep trackers against polysomnography (PSG) — the gold-standard clinical method that measures brain waves, eye movements, and muscle activity to determine sleep stages.

The researchers tested 11 devices across 75 participants, generating 349,114 epochs (30-second segments) of directly comparable data. They organized the devices into three categories based on form factor and measurement approach:

Wearables: Devices worn on the body, such as the Apple Watch 8 and Oura Ring 3, which use accelerometry and photoplethysmography (PPG) to estimate sleep stages.
Nearables: Devices placed near the body, such as the Withings Sleep Tracking Mat and Google Nest Hub 2, which use ballistocardiography or radar-based motion sensing.
Airables: Smartphone applications that rely entirely on the phone's built-in microphone and accelerometer, such as SleepRoutine and Pillow.

The primary performance metric was the macro F1 score, which balances precision and recall across all sleep stages (wake, light sleep, deep sleep, REM). A perfect score is 1.0; random guessing would produce a score near 0.25 for a four-class problem. The results revealed a wide performance gap:

Macro F1 scores and key biases for representative devices from the 2023 Lee et al. multicenter validation study (75 participants, 349,114 epochs vs. PSG).
Device Category	Device	Macro F1 Score	Key Bias
Airacle	SleepRoutine	0.69	Best overall; slight overestimation of deep sleep
Nearable	Amazon Halo Rise	0.62	Overestimates sleep latency by ~29 minutes
Wearable	Apple Watch 8	0.57	Misclassifies wake as sleep; overestimates sleep efficiency
Wearable	Oura Ring 3	0.54	Similar wake-as-sleep bias; moderate deep sleep accuracy
Nearable	Withings Sleep Tracking Mat	0.51	Overestimates total sleep time; poor wake detection
Nearable	Google Nest Hub 2	0.48	Overestimates sleep latency; inconsistent REM detection
Airacle	Pillow	0.26	Extreme deep sleep bias (predicted 59% deep sleep vs. 10.8% from PSG)

The range is striking. The best-performing device (SleepRoutine, macro F1 = 0.69) approached reasonable agreement with PSG, while the worst (Pillow, macro F1 = 0.26) was barely better than chance. Pillow's extreme deep sleep bias — predicting 59% of all epochs as deep sleep compared to the PSG's 10.8% — illustrates how algorithmic assumptions can produce wildly misleading data.

Editorial illustration showing three categories of sleep trackers: a smartwatch on a wrist for wearable, a smart ring on a finger for nearable, and a contact-free bedside device for airable. Small simplified sleep-stage icons below each indicate their different detection strengths across deep sleep, REM, light sleep, and wake stages. — Wearables, nearables, and airables each have distinct strengths and blind spots — no single form factor outperforms across all sleep stages.

Category-Specific Limitations: Wearables, Nearables, and Airables

The overall F1 scores tell only part of the story. Each category of device has systematic biases that matter differently depending on what a user wants to measure.

Wearables: The Wake-as-Sleep Problem

Both the Apple Watch 8 and Oura Ring 3 showed substantial proportional bias in sleep efficiency — meaning they systematically misclassify periods of quiet wakefulness as sleep. For someone with insomnia who spends significant time lying still while awake, this bias can produce a falsely reassuring sleep score. The device reports "you slept 7.5 hours" when the person actually slept 5.5 and lay awake for 2. This is not a minor calibration issue; it is a fundamental limitation of using movement and heart rate as proxies for consciousness.

For readers who want a deeper dive into specific wearable accuracy data, the evidence-based comparison of fitness trackers using PSG validation data provides detailed per-device breakdowns, and the latest smart watch accuracy review (2024–2026) covers more recent studies.

Nearables: The Latency Overestimation Problem

Nearable devices — the Withings Sleep Tracking Mat, Google Nest Hub 2, and Amazon Halo Rise — performed reasonably well at detecting sleep versus wake once sleep was established, but they struggled with sleep latency (the time it takes to fall asleep). The mean bias across nearables was an overestimation of 29.02 minutes. In practical terms, if you lie in bed for 15 minutes before falling asleep, a nearable might report that you took 44 minutes. For someone tracking sleep latency as a metric of concern, this error could create unnecessary worry or lead to incorrect conclusions about the effectiveness of sleep hygiene changes.

Airables: Extreme Performance Variation

Smartphone-based airables showed the widest performance spread. SleepRoutine achieved the highest macro F1 score in the entire study (0.69), demonstrating that a well-designed algorithm running on a phone's microphone can sometimes outperform dedicated wearable hardware. At the other extreme, Pillow's macro F1 of 0.26 and its massive deep sleep overestimation (59% vs. 10.8%) serve as a cautionary tale: not all apps are created equal, and users have no easy way to distinguish a well-validated algorithm from one that simply produces pleasing graphs.

The study also found that performance varied by participant characteristics — specifically BMI, sleep efficiency, and apnea-hypopnea index (AHI) — but not by sex. This means a device that works reasonably well for one person may be systematically less accurate for another, adding another layer of uncertainty for individual users.

Orthosomnia: When the Quest for Perfect Sleep Data Backfires

In 2017, clinical researchers Baron and colleagues published a case series in the Journal of Clinical Sleep Medicine that introduced a new term to the sleep medicine lexicon: orthosomnia. The word combines the Greek "ortho" (correct) with "somnia" (sleep) to describe a perfectionistic quest for ideal sleep data that paradoxically worsens sleep quality. The patients in the case series had become so anxious about their tracker scores that they could not sleep — a cruel irony that the devices were designed to prevent.

The 2026 Frontiers in Psychology study provides the largest population-level evidence for this phenomenon. The researchers surveyed 1,002 Norwegian adults and found that 46% had used sleep apps at some point. When they compared the composite scores for negative effects between groups, a clear pattern emerged: participants who met the Bergen Insomnia Scale criteria for insomnia reported significantly higher negative effect scores (11.63) compared to those without insomnia (10.20), with a p-value below 0.001.

The most commonly reported negative effect was "More worried about my sleep," cited by 17.8% of all app users. The most commonly reported positive effect was "Learned about my own sleep" (48.1%), which captures the fundamental tension: the same data that helps some users gain insight can fuel anxiety in others.

Checking your sleep score immediately upon waking, before getting out of bed
Feeling anxious or disappointed when the score is lower than expected
Comparing your nightly scores to benchmarks from social media or device averages
Making significant behavioral changes (e.g., skipping social events) to chase a higher score
Feeling that your subjective experience of sleep quality is less valid than the device's data

The study also found that older age was associated with a reduced likelihood of reporting negative effects, suggesting that younger users — who are also the heaviest adopters of sleep tracking — may be more vulnerable to orthosomnia. Interestingly, insomnia diagnosis was not a significant predictor in the fully adjusted logistic regression model, indicating that the relationship between sleep tracking and anxiety is complex and not simply a matter of pre-existing sleep problems.

Editorial illustration of a person lying in bed at night unable to sleep, staring anxiously at a smartphone displaying sleep score graphs and data dashboards. Small floating charts hover near the screen. The bedroom atmosphere is tense with cool dark blue and sharp neon blue lighting. — When the pursuit of a perfect sleep score becomes the very thing keeping you awake, the tracker has shifted from tool to obstacle.

Your Sleep Data Is Not Protected Medical Information

There is a widespread assumption that data collected by health-related devices receives the same legal protections as medical records. This assumption is incorrect for most consumer sleep trackers. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) applies only to data held by "covered entities" — healthcare providers, insurers, and their business associates. A sleep tracking app or wearable company is generally not a covered entity.

As cybersecurity expert Aimee Simpson of Huntress explained to Forbes, if a sleep tech company "wanted to sell your sleep data, they'd be well within their rights to do so in most cases." The data has, in the words of another expert quoted in the same article, "the same protections as marketing data." This is not a hypothetical concern. Sleep data is, as cybersecurity expert Trevor Horwitz described it, "some of the most intimate behavioral data a person can generate — when you are unconscious, vulnerable, routine-driven and physiologically exposed."

When Trackers Help and When They Harm: Evidence-Based Guidelines

The evidence reviewed here does not suggest that everyone should throw away their sleep tracker. Rather, it points to a more nuanced conclusion: the value of a sleep tracker depends entirely on how it is used and what expectations the user brings to it.

When trackers help

Pattern awareness over weeks: Looking at trends in sleep timing, duration, and consistency over a period of weeks can reveal patterns that are hard to notice day-to-day. A tracker that shows you consistently sleep less before workdays, for example, provides actionable information.
Behavioral change motivation: Seeing that a later bedtime correlates with lower sleep scores can reinforce the decision to start winding down earlier. The tracker serves as a feedback loop for habits you are actively trying to change.
Identifying trends to discuss with a doctor: If your tracker consistently shows very low deep sleep percentages or highly fragmented sleep over several weeks, that pattern may be worth raising with a healthcare provider — not as a diagnosis, but as a data point that warrants further investigation.

When trackers harm

Obsessing over nightly scores: The day-to-day variability in tracker data is often noise, not signal. Treating each night's score as a meaningful metric invites the anxiety that fuels orthosomnia.
Comparing to unrealistic benchmarks: Device companies often present "optimal" sleep scores that may not be achievable for everyone. Chasing a 95/100 sleep score when your biology naturally trends toward 80 is a recipe for frustration.
Using data to self-diagnose: A tracker that flags possible sleep apnea or low oxygen saturation is a reason to see a doctor, not a diagnosis. The devices are not validated for clinical screening, and false positives can cause unnecessary anxiety while false negatives can delay real treatment.

For readers who want to understand which specific metrics actually correlate with health outcomes — and which are mostly noise — the metric-evidence tier framework provides a structured approach to separating clinically meaningful data from algorithmic guesswork.

The AASM's recommendation remains the most sensible framework: use sleep technology to supplement healthy sleep behaviors, not as a fast fix. A tracker can tell you that you slept poorly. It cannot tell you why, and it cannot fix the underlying cause. That work — whether it involves improving sleep hygiene, addressing an undiagnosed sleep disorder, or managing the anxiety that orthosomnia creates — still belongs to the person in the bed.

Ask a Question or Suggest a Source

Ask a clarifying question about this concept or suggest an additional authoritative source. Science pages benefit from reader-contributed corrections and source additions.

Comments

Join the discussion with an anonymous comment.

Loading comments...

The Sleep Tracker Boom: 39% of Adults Now Monitor Their Sleep

What the Largest Multicenter PSG Validation Study Found

Category-Specific Limitations: Wearables, Nearables, and Airables

Wearables: The Wake-as-Sleep Problem

Nearables: The Latency Overestimation Problem

Airables: Extreme Performance Variation

Orthosomnia: When the Quest for Perfect Sleep Data Backfires

Your Sleep Data Is Not Protected Medical Information

When Trackers Help and When They Harm: Evidence-Based Guidelines

When trackers help

When trackers harm

Related Concepts & Further Reading

Ask a Question or Suggest a Source

Comments