Strategy for Generating Blinded Evidence for Single-Arm Trials with External Controls Using Expert Review of Home Video

Pilot Study Results

Five P/CGs consented to participate and were enrolled in the pilot study. All P/CG participants were mothers between 25 and 37 years of age, and the majority were college-educated. There were two healthy control participants, three and 15 months old, and three case participants with the degenerative disease, aged 10, 17, and 43 months. A total of 55 videos were submitted by P/CGs in the pilot study. Some P/CGs submitted more than one video for a single activity. Only the 30 videos submitted by P/CGs of the three cases were graded by the expert panel.

All videos were captured and submitted during a single week for each child, and all submitted activities met at least eight of the 10 quality criteria (Table 1). Scores for each quality criterion summarized across all activities for the five P/CGs are illustrated in Fig. 2. One P/CG did not capture the full extent of two activities; she did not put the toy in the camera frame to allow proper evaluation of eye tracking and she also did not offer the child the opportunity to pack/unpack blocks. One P/CG did not use the app to record or upload videos and instead uploaded videos directly from her phone’s video library. Two P/CGs filmed from less than an arm’s length away. Not all activities were filmed using a tripod. Finally, four P/CGs did not have the child appropriately positioned in the frame as instructed in at least one activity (e.g., showing side view instead of frontal view). Despite this, all videos were of sufficient quality to be graded, and the issues that occurred provided the opportunity to improve instructions for the clinical trial training materials.

Figure 2figure 2

Quality of submitted videos across activities in the pilot study.

Scores for each quality criterion were added across all videos submitted by a single P/CG for a single activity. For instance, if a P/CG submitted two videos for ‘sitting’, then this activity was marked ‘adequate’ for the ‘proper duration’ criterion if at least one of the videos was no longer than three minutes.

During interviews, P/CGs reported one week as adequate to complete recordings and said they had no difficulty using their smartphones to record. On average, P/CGs reported spending an hour and half reviewing training materials and videos, which they found clear and easy to follow. In particular, P/CGs found the laminated reference guide extremely useful, referring to it many times before and during filming. P/CGs indicated that all of the selected activities were important to show their child’s developmental milestones and did not recommend any deletions or additions. All of them were pleased to have the opportunity to submit recordings of additional skills. A few P/CGs provided videos of additional skills that were important to them, such as self-feeding. Activities found easiest to record were those where the children were sitting in a chair (i.e., buckled in or not moving around) and/or daily activities (i.e., interacting, smiling, sitting, standing). Activities requiring their child to be mobile (i.e., crawling, walking) were more difficult and often required another person to assist.

There was unanimous response from P/CGs that being able to remotely show their child’s recorded milestone videos to the doctor was preferred over detailed assessments at a clinic visit, as their child would often get fatigued during clinic visits, act differently and display fewer skills at the doctor’s office compared to at-home. Excerpts from P/CG interviews supporting their appreciation of remote assessment included:

“It was kind of frustrating for me [at the clinic] … because he was not displaying all the skills that he does at home when he was in that clinical setting.”

“You can see them in their comfortable environment and the way they act a lot more.”

“I think I would prefer having the doctor look at the videos because the child is going to feel safer and more comfortable at his or her own house.”

Grading Pretest Results

In the pretest of grading procedures, the three experts took five to eight hours to review 30 videos, with an estimated five to 10 viewings per video. They reported the two-minute video length to be appropriate, and the quality sufficient for scoring purposes. In general, raters found the videos relatively easy to grade and felt that the developmental milestones were clinically meaningful, observable and objective, and appropriate to assess change during a clinical trial. They also felt the milestones could be reasonably captured through home-based videos and were similar to what they would expect in a clinic setting. Raters reported the following video characteristics as barriers to accurate grading: recordings captured from a side profile when full frontal was required, videos not showing if the child was looking at the P/CGs vs. another object, videos cut too short thereby not allowing sufficient time for the child to fully demonstrate a skill (e.g., sitting without support for 30 s), and audio capture that was not quite clear enough to discriminate subtle vocalizations when grading communication-related milestones. Experts also identified the need for clearer definitions of what constituted “tolerates attention,” “nasal” vs. “throaty” sounds, and “persistent reach,” suggesting definitions based on BSID-3 to be added.

In the context of this small pilot study, masking to time point and treatment was not feasible. Nevertheless, experts felt that they would be effectively masked to treatment and likely time, but not to patient identity in a larger study due to the fact that they had graded that patient before. Certain characteristics potentially interfering with limiting recall were identified (e.g., the same P/CG appearing in videos, memorable eyeglasses, or memorable background decor). Overall, raters agreed that being masked to treatment status and randomizing hundreds of videos from dozens of children across various time points, with videos of the same child appearing at least 10 videos apart, would be feasible and effective at minimizing rater recall. Experts agreed that the objective rubric helped eliminate bias when grading the videos.

Revisions After the Pilot Study and Grading Pretest

Using the pilot and pretest results, we improved the clarity of the activities and milestones, grading rubric, and training materials. Overall, “head control” was separated into its own activity, standing and walking were combined into one video activity (but would be separately graded), crawling and rolling were combined into one video, and the list of milestones in each activity and rubric checklist was reorganized by the order they appear in BSID-3, corresponding to order of difficulty. Moreover, rater feedback also resulted in the addition, deletion, and combination of some milestones. For example, “elevates trunk while prone—shifts weight” was deleted, while “rolls from back to sides” was added. “Undifferentiated throaty sounds” and “undifferentiated nasal sounds” were combined into one milestone. The updated activities and milestones are shown in Supplementary Table 1. Instructions for P/CGs were modified to better enable milestone completion and assessment, i.e., “hold out item for child to reach for at least 5 s” (italics indicate added text after pilot and pretest). To further clarify correct steps, professionally recorded sample videos were commissioned for the main trial app for P/CGs to review as ideal examples before attempting their own recordings.

The grading platform was designed to automatically attribute points for directly related milestones when a more advanced milestone was achieved (Table 2). For example, if a rater marked that the child controls his or her head while upright for 15 s, then the child automatically received credit for “controls head while upright, 3 s.” Given the potential for possible recall by certain participant characteristics, P/CG instructions were updated to recommend removal or covering of any identifiers (e.g., child names on furniture or walls) before recording. Anything missed would be blurred by quality control reviewers before a video was released for grading.

Table 2 Automatically scored developmental milestones

Lastly, we made changes to the video scripts and the laminated reference guides. For example, the laminated guides were updated to state explicitly that it is acceptable to exceed two minutes of recording, if needed. P/CG instructions were updated to encourage them to submit videos demonstrating a child’s best abilities, even if there were minor interruptions or background noises from other children.

Final Planned Strategy for the Clinical Trial with Parallel NHS

Identical and standardized at-home video recording instructions are planned for implementation in both the clinical trial and parallel NHS. Videos will be submitted to a secure storage platform via the smartphone app, which will automatically notify P/CGs of their allotted time period (one week or more in duration) for video capture of required activities and send reminders. If they choose not to submit a video for an activity, the reason for not submitting one would be prompted (e.g., “My child is ill”), and study staff will have the option to extend or reschedule the window for video capture if appropriate. Once quality is reviewed and found acceptable, videos will be anonymized and made available in the expert reviewer portal. Site staff could request re-recording of specific activities if needed.

To allow tracking, randomization of videos, and masking to study (trial vs. NHS, or treatment) and time (e.g., baseline vs. 12 months), each video in the trial and NHS will be assigned a unique sequence number. Expert panel grading sessions will be delayed until study accrual enables separating videos of the same child by a minimum of 10 videos to minimize recall. Expert raters will have the flexibility to view each child’s video recordings as many times as they prefer during their review. In addition to checking off milestones observed, experts will also view side-by-side video recordings at baseline and a follow-up time (e.g., 24 months) to assess GIC using a nine-point scale ranging from extremely deteriorated to extremely improved.

For quality control, plans are to randomly intersperse 25% to 30% of videos from an interim timepoint twice to allow assessment of intra-rater reliability and to monitor raters for drift in grading over time (Fig. 3). Instead of grading baseline videos twice, 25% of the videos from early assessments (e.g., three or six months post-baseline) will be used to approximate baseline. Grading will not begin until 50% of the participants submit their 12-month videos. A second round of grading will commence upon receipt of the 12-month recordings from the remaining participants. This process is to be repeated for the 24-month time point, with 18-month videos used for quality control. These quality control and masking procedures are planned for both the clinical trial and NHS to allow blinding for study (i.e., treatment).

Figure 3.figure 3

Video grading and quality control plan.

The planned enrollment size was 14 for the clinical trial and 10 for the NHS. Video collection time points were 0, 3, 6, 9, 12, 18, and 24 months. GIC, global impression of change.

Comments (0)

No login
gif