A serene nocturnal bedroom scene with a person sleeping peacefully on their side, wearing a smart watch with a soft green PPG glow. Floating above them is a translucent, dreamlike hypnogram wave with sleep stage labels (light, deep, REM) and subtle heart rate rhythm lines.
The smart watch tracks while the brain sleeps — but the data it produces is an approximation, not a clinical record.

How Smart Watches Measure Sleep: Actigraphy and PPG vs. Polysomnography

Every smart watch that claims to track sleep relies on the same two sensor technologies: an accelerometer and a photoplethysmography (PPG) sensor. The accelerometer detects movement — or the absence of it — using a method called actigraphy. When your body is still for extended periods, the watch infers that you are asleep. When you move, it registers wakefulness. This approach alone is reasonably good at distinguishing sleep from wake, but it cannot tell you anything about what kind of sleep you are in.

The PPG sensor adds a second layer. It shines green or red LEDs through your skin and measures changes in blood volume with each heartbeat. From that signal, the watch calculates heart rate and heart rate variability (HRV). Since heart rate and HRV follow predictable patterns across sleep stages — higher and more variable during REM, lower and more stable during deep sleep — the device uses these patterns to estimate which stage you are in.

Polysomnography (PSG), the gold standard for clinical sleep measurement, works entirely differently. A PSG study records brain wave activity via electroencephalography (EEG), eye movements via electrooculography (EOG), and muscle tone via electromyography (EMG). These three signals are the actual physiological correlates of sleep stages. When a sleep technician scores a PSG recording, they are looking at your brain's electrical activity — not inferring sleep stage from heart rate and movement.

This distinction matters because it explains the fundamental accuracy ceiling of consumer wearables. A smart watch does not measure sleep. It measures proxies for sleep — stillness, heart rate patterns, and blood flow changes — and then applies an algorithm to guess which stage you are in. The guess can be quite good for total sleep time, but it is still a guess.

Split editorial illustration: left side shows a person in a sleep lab with full polysomnography electrodes on the scalp, face, and chest connected to a monitoring machine; right side shows a person sleeping at home wearing only a smart watch.
The difference between clinical polysomnography (left) and consumer smart watch tracking (right) is not just convenience — it is the difference between measuring brain activity and inferring sleep from proxies.

What Accuracy Actually Means: Sensitivity, Specificity, and Kappa Scores

When researchers compare a smart watch against PSG, they report three key metrics that most consumers never see. Understanding these terms is essential for interpreting the study data in the next section.

Sensitivity (also called recall) measures how well the device correctly identifies a specific state. For sleep detection, high sensitivity means the watch rarely misses actual sleep — it correctly flags sleep epochs as sleep. Most modern smart watches achieve sleep sensitivity above 90%, meaning they are excellent at recognizing when you are asleep.

Specificity measures how well the device correctly identifies the opposite state. For wake detection, high specificity means the watch rarely flags wake epochs as sleep. This is where consumer wearables struggle. A 2026 review by sleep scientist Dean J. Miller reports that most trackers correctly identify only between 26% and 73% of wake epochs. When you are lying still in bed but awake — reading, worrying, or simply resting — the watch is likely to classify that time as light sleep.

Cohen's kappa (κ) is a statistical measure that accounts for chance agreement. A kappa of 0 means the device agrees with PSG no better than random guessing. A kappa of 1 means perfect agreement. In sleep research, κ values above 0.60 are considered substantial, 0.40 to 0.60 moderate, and below 0.40 fair to poor. For context, even two trained sleep technicians scoring the same PSG recording typically achieve a kappa of around 0.75 — meaning there is inherent variability in sleep staging even at the clinical level.

Head-to-Head Study Data: Apple, Samsung, Google, Garmin, and Fitbit

Several peer-reviewed studies published between 2023 and 2025 provide the most reliable accuracy data currently available for smart watch sleep tracking. The table below summarizes the key findings for the five major smart watch brands, drawn primarily from the 2023 Korean multicenter study (75 participants, 349,114 epochs analyzed), the Schyvens et al. 2025 independent study (62 adults, VLAIO-funded), and the Robbins et al. 2024 study (36 participants, Oura-funded).

Head-to-head accuracy data for major smart watch brands from peer-reviewed validation studies (2023–2025). Kappa values and sensitivity percentages are drawn from independent studies unless otherwise noted.
DeviceOverall Sleep Staging (κ)Deep Sleep SensitivityWake Detection SensitivityTotal Sleep Time BiasStudy Source
Apple Watch 80.53 (moderate)50.5–50.7%52.2%Underestimates deep sleep by ~43 minSchyvens et al. 2025; Robbins et al. 2024
Samsung Galaxy Watch 50.40–0.60 (moderate)Not separately reported26–73% rangeOverestimates total sleep timeKorean multicenter 2023
Google Pixel Watch0.40–0.60 (moderate)Macro F1 0.59 (best among wearables in Korean study)26–73% rangeOverestimates total sleep timeKorean multicenter 2023
Fitbit Sense 20.42–0.55 (moderate)61.7%26–73% range+6.3 min (lowest bias in Schyvens study)Schyvens et al. 2025; Korean multicenter 2023
Garmin Vivosmart 40.21 (fair)Not separately reportedNot separately reportedSignificant overestimationSchyvens et al. 2025

Several important patterns emerge from this data. First, the Apple Watch 8 achieved the highest overall sleep staging kappa (0.53) in the independent Schyvens study, making it the current leader among smart watches for general sleep staging accuracy. However, its deep sleep sensitivity of approximately 50% means it misses roughly half of actual deep sleep epochs — a significant limitation for anyone specifically interested in tracking slow-wave sleep.

Second, the Google Pixel Watch and Fitbit Sense 2 showed superior deep sleep detection in the Korean multicenter study, with macro F1 scores of 0.59 and 0.56 respectively. This suggests that different devices optimize for different aspects of sleep staging — no single smart watch excels across all metrics.

Third, the Garmin Vivosmart 4 scored substantially lower than all other devices (κ=0.21). It is important to note that this device is two or more generations old. Current Garmin models such as the Venu and Fenix 7/8 series may perform differently, but consistent cross-study data for the latest Garmin generations is not yet available.

For readers specifically interested in Apple Watch accuracy, our dedicated guide covers Apple Watch sleep tracking accuracy in greater depth, including additional studies and firmware-specific performance data.

Why Sleep Staging Is the Weakest Metric

Across all studies and all devices, one finding is remarkably consistent: consumer wearables correctly identify only between 53% and 60% of sleep stage epochs. This means that for every ten 30-second epochs of sleep, the watch will misclassify four or five of them into the wrong stage.

The failure mode is systematic rather than random. All devices share a common algorithmic tendency: they misclassify wake, deep sleep, and REM sleep as light sleep. This is a conservative design choice. By defaulting to light sleep — the most common stage across the night — the algorithm minimizes dramatic errors while producing a smoother, more plausible-looking hypnogram.

The practical consequence is that your smart watch will consistently overestimate total sleep time and underestimate wake after sleep onset (WASO). The Schyvens et al. study found that all tested devices underestimated WASO by 12 to 48 minutes per night. If you spend 45 minutes lying awake in the middle of the night, your watch might report only 10 to 15 minutes of wakefulness — and classify the rest as light sleep.

This systematic bias has real implications. If you rely on your watch to tell you whether you slept well, and the watch consistently overestimates your sleep, you may feel confused or frustrated when your subjective experience — lying awake for an hour — does not match the data. Conversely, if the watch tells you that you had excellent sleep efficiency, you might dismiss real sleep problems that warrant attention.

What Smart Watches Still Cannot Measure

Despite rapid advances in sensor technology and machine learning algorithms, consumer smart watches remain fundamentally unable to measure several clinically important sleep signals. Understanding these limitations is essential for calibrated expectations.

  • Brain wave activity (EEG): Smart watches cannot detect the electrical activity of your brain. They cannot distinguish between the slow delta waves of N3 deep sleep and the faster theta activity of N1 light sleep because they do not measure brain waves at all. This is the single most important limitation.
  • Sleep apnea diagnosis: No consumer smart watch can diagnose sleep apnea. While Apple Watch (Series 9 and later, Ultra 2) and Samsung Galaxy Watch (Watch 7, 8, Ultra) have received FDA authorization for sleep apnea screening notifications, these features are explicitly screening alerts — not diagnostic tools. They can tell you that your breathing patterns suggest you should see a sleep specialist, but they cannot confirm or rule out obstructive sleep apnea.
  • Microarousals: These are brief awakenings lasting 3 to 15 seconds that are invisible to actigraphy-based devices. A person with sleep apnea or periodic limb movement disorder may experience dozens of microarousals per hour without any conscious awareness — and their smart watch will show uninterrupted sleep.
  • Precise sleep architecture: The detailed structure of your night — the number of sleep cycles, the duration of each cycle, the ratio of NREM to REM within each cycle — cannot be accurately captured by PPG and accelerometry alone. The watch produces a plausible-looking hypnogram, but it is an algorithmic reconstruction, not a measurement.

For a detailed explanation of how the FDA-authorized sleep apnea screening features work and what their limitations are, see our guide on Apple Watch sleep apnea detection and the Breathing Disturbances feature.

Practical Guidelines for Interpreting Your Smart Watch Sleep Data

Given the accuracy data presented above, how should you actually use your smart watch sleep data? The answer depends on what you are trying to learn. The following guidelines are grounded in the evidence and designed to help you extract useful information without overtrusting the numbers.

  • Trust total sleep time trends, not individual nights. Your watch is reasonably good at detecting whether you slept more or less than your baseline. A seven-night average showing that you slept 30 minutes less than usual is a useful signal. A single night showing 7 hours and 42 minutes is not a precise measurement.
  • Ignore the stage breakdowns for clinical decisions. If your watch says you got only 45 minutes of deep sleep, do not panic. The 50–60% accuracy ceiling for sleep staging means that number could easily be off by 20 minutes or more in either direction. Use stage data as a rough directional guide — more deep sleep than usual, less REM than usual — not as a precise measurement.
  • Pay attention to consistency, not absolute values. The most reliable signal from a smart watch is change over time. If your sleep onset latency has been increasing over two weeks, that is worth investigating. If your heart rate during sleep has been trending upward, that may warrant a conversation with your doctor. The trend is more trustworthy than any single night's data.
  • Do not let the data drive anxiety. A known risk of sleep tracking is orthosomnia — a proposed condition where preoccupation with sleep data actually worsens sleep quality. If you find yourself checking your sleep score first thing in the morning and feeling distressed when the number is low, consider taking a break from tracking or switching to a simpler device that reports only total sleep time.
  • Know when to seek professional evaluation. If you consistently feel tired despite your watch showing adequate sleep, or if your watch flags breathing disturbances repeatedly, the appropriate next step is a clinical evaluation — not buying a different smart watch. No consumer wearable can replace a sleep study.

To understand how your watch calculates its overall sleep score — and why that number is a rough estimate rather than a precise measurement — read our explainer on how sleep scores are calculated and what they actually mean.

Five subtle smart watch silhouettes in a row with no visible logos. Above each, translucent floating bars and kappa (κ) symbols at varying heights representing different sleep stage detection accuracy levels. Small stage labels (light, deep, REM) appear near each.
Accuracy varies significantly across brands and metrics. No single smart watch leads in all categories — choose based on which metric matters most for your use case.