Inter-Rater Agreement In Evaluation Of Disability Systematic Review Of Reproducibility Studies
The original research design proposed a reliability study for functional assessment and toilet evaluation (RELY 1), followed by a randomised comparison with current practice . Given that state changes have immobilized RELY 1 for more than a year, the observed reproducibility reflects the effect of short-term training in functional assessment without standardization . The reproducibility of experts without training can be relatively low or worse. In terms of adequacy between experts, the proportion of psychiatrist-to-psychiatrist comparisons that remained below the threshold was higher in RELY 2 for all thresholds (Figure 2). For a threshold of 25 WC points, for example, the share of comparisons in the “maximum acceptable difference” was 73.6% in RELY 2, compared to 61.6% in RELY 1 (p = 0.008). Comparison of SEMalternative.work showed a significant variation of -5.2 percentage points (95% CI – 9.7 to −0.6, Tables 3 and 4) in RELY 2. Overall, the mean linkability of the inter-rater was 0.45 at all conditions and stopping points and ranged from iCC of 0.86 (musculoskeletal disorders); Reduction of working time22) to κ from -0.10 (narcolepsy); Disability benefit11) (Table 8.4). Six studies reported excellent or good inter-board reliability for an overall assessment of incapacity for work, with CICs of 0.6446 and 0.65.44 percent concordance 82.4% (return-to-work recommendations37) or 0.8023 and 0.8622 for the reduction of working time. One study presented mixed judgments in only one case, which we consider globally to be a “good agreement”, based on the relative importance of the results of functional work capacity (91.2% agree on remaining work capacity) and for work recommendations (86% agree on limitations in work performance) on the outcome of willingness and ability to return to work (56% agree on the reduction of working time). 22234446 Two studies were described as “generalisable” 2223 as “likely to be generalisable”, 3739 and 2 as “probably non-generalisable”.
4446 In order to test whether psychiatrists systematically differ in their assessments, we formulated two mixed effect models.. . . .