Access the full text.
Sign up today, get DeepDyve free for 14 days.
BioLumin: An Immersive Mixed Reality Experience for Interactive Microscopic Visualization and Biomedical Research Annotation AVIV ELOR, STEVE WHITTAKER, and SRI KURNIAWAN, University of California, Santa Cruz SAM MICHAEL, The National Institutes of Health, National Center for Advancing Translational Sciences Many recent breakthroughs in medical diagnostics and drug discovery arise from deploying machine learning algorithms to large-scale data sets. However, a significant obstacle to such approaches is that they depend on high-quality annotations generated by domain experts. This study develops and evaluates BioLumin, a novel immersive mixed reality environment that enables users to virtually shrink down to the microscopic level for navigation and annotation of 3D reconstructed images. We discuss how domain experts were consulted in the specification of a pipeline to enable automatic reconstruction of biological models for mixed reality environments, driving the design of a 3DUI system to explore whether such a system allows accurate annotation of complex medical data by non-experts. To examine the usability and feasibility of BioLumin, we evaluated our prototype through a multi-stage mixed-method approach. First, three domain experts offered expert reviews, and subsequently, nineteen non-expert users performed representative annotation tasks in a controlled setting. The results indicated that the mixed reality system was learnable and that non-experts could generate high-quality 3D annotations after a short training session. Lastly, we discuss design considerations for future tools like BioLumin in medical and more general scientific contexts. CCS Concepts: • Hardware→ Analysis and design of emerging devices and systems;• Human-centered computing→ User studies; Visualization design and evaluation methods; Additional Key Words and Phrases: Immersive technologies, mixed reality, spatial computing, magic leap, interactive visual- ization, biomedical visualization, human-computer interaction ACM Reference format: Aviv Elor, Steve Whittaker, Sri Kurniawan, and Sam Michael. 2022. BioLumin: An Immersive Mixed Reality Experience for Interactive Microscopic Visualization and Biomedical Research Annotation. ACM Trans. Comput. Healthcare 3, 4, Article 44 (October 2022), 28 pages. https://doi.org/10.1145/3548777 1 INTRODUCTION The discovery of new drug interventions and disease models often arises from visual analysis of complex molec- ular components and tissues [2, 6]. Visualization plays a crucial innovative role in key medical research and diagnosis tasks such as analyzing proteins [18, 52, 65], lesion growth [26, 30, 35], epidemiology [9], and drug re- purposing [2, 6]. Consequently, visualization tools often require the development of diverse technical methods This work was supported by the 2020 Seed Fund Award #2020-0000000146 from CITRIS and the Banatao Institute at the University of California. Authors’ addresses: A. Elor, S. Whittaker, and S. Kurniawan, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, 95060, USA; emails: {aelor, swhittak, skurnia}@ucsc.edu; S. Michael, The National Institutes of Health, National Center for Advancing Translational Sciences, 6701 Democracy Boulevard, Bethesda MD 20892, USA; email: michaelsg@mail.nih.gov. This work is licensed under a Creative Commons Attribution International 4.0 License. © 2022 Copyright held by the owner/author(s). 2637-8051/2022/10-ART44 https://doi.org/10.1145/3548777 ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:2 • A. Elor et al. when involving geospatial, molecular, and social data analysis [9]. It is also becoming increasingly common for medical research to exploit automatic machine learning techniques that depend on large amounts of manually annotated data [32, 54, 66]. Despite their promise, these visualization tools have experienced limited adoption with challenges arising from both data quality and system usability [9, 64, 66]. For example, researchers currently use desktop platforms to manually annotate thousands of image segments to build machine learning models to understand biomedi- cal responses and expedite diagnostics. However, such image annotation is time-consuming, requiring expert human intervention to generate accurate ground truth annotations needed to build these models [53]. Another related bottleneck is the cost of annotation: experts are scarce, their time is valuable, and having them create gold-standard annotations is an expensive undertaking. Furthermore, current tools may limit the efficiency and accuracy of annotation; state-of-the-art visual annotation tools often implement 2D data representations that are viewed in 2D virtual mediums [1, 51]. Such platforms make it hard to annotate visually complex data when there is a requirement to detect connections, perceive spatial arrangements, and overlaps between different visualiza- tion elements. These challenges suggest potential benefits for effective 3D environments that allow non-expert users to navigate, explore and annotate complex visual data. The goal of this paper is to design and evaluate such tools. Recent developments in high-throughput imaging and nuclear segmentation techniques for 3D cell cultures allow for accelerated biological data collection and modeling, making new methods possible [6]. However, we need to extend our analytic approaches as Eroom’s law indicates that the discovery of new clinical therapeutics is declining [28, 56]. At the same time, hardware power and affordability of computational devices have never been better than the present [57]. These underlying trends led us to explore the viability of an extended reality environment to accelerate this visually challenging data analysis process. Medical data is often complex and multidimensional, so 2D viewing methods may be ineffective in supporting the data exploration needed for generating efficient, high-quality annotations. This led us to consider mediums that enable 3D rendering of such data for annotation while also considering 3D stereo-omnidirectional viewing or extended reality (XR). XR enables flexible interaction modalities compared to the keyboard and 360-degree viewing of data. The approach is broadly applicable to data types covering anything from a virus’ genetic evolu- tion to the geographical spread of infection or sociodemographics factors that are predictors of mortality. Other research indicates the success of XR and immersive environments in showing demonstrable improvements in perception, insight, and retention of complex information relating to big data [17]. Such immersive input modal- ities combined with commercially available components also improve user perception, and comprehension [27], especially for collaborative problem solving, informative learning and cognitive engagement [20]. Moreover, en- gagement in a virtual environment with an XR head-mounted display (HMD) helps detach users from the physical world and focus on virtual tasks at hand [5, 16, 21]. Researchers are beginning to explore the capabilities of extended reality for biological visualization. Zhang et al. discuss the development of “BioVR,” a Virtual Reality (VR) Unity-based experience that visualizes DNA, RNA, and protein sequences, which led to a successful tool allowing researchers to review and simulate temporal biological datasets [68]. Johnston et al. discuss user testing of another VR Unity environment, which rendered 3D cell data for the exploration of cellular data [31]. Participants in their study showed improved performance in the analysis of cell biology when using VR compared to a non-VR medium [31]. With the growing availability of extended reality devices, many other researchers are beginning to adopt immersive methods for visualizing biological data in extended reality [17]. Overall, then, this prior work suggests that extended reality systems represent a promising approach to improving perception and interaction with complex data, addressing issues with current approaches, and transforming analysis and discovery processes [49]. We have already identified the costs and challenges of recruiting medical data annotation experts. In addition to offering more flexible and efficient annotation for experts, these novel immersive methods may also facilitate ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:3 non-expert annotation, overcoming bottlenecks for machine learning approaches. A flexible virtual environment for crowdsourcing and analyzing such data that nondomain experts can use could help address these annotation challenges [47]. Other work suggests the viability of these approaches. For example, the FoldIt Project by the Uni- versity of Washington successfully employed crowdsolving with non-experts to assist in re-configuring protein structures. Fifty-seven thousand players produced valuable data that matched or outperformed expert-sourced algorithmic solutions for identifying potential proteins for drug discovery [14]. In another crowdsourced medical data game, Luegno-Oroz et al. demonstrated that web-based games could serve as an influential online medium for crowdsourcing the counting of malaria parasites in blood smear images with up to 99% by non-expert par- ticipants [43]. In similar instances of crowdsourcing, many researchers have found success utilizing Amazon’s Mechanical Turk (MTurk [11]) in the medical analysis of cancer survivors [3], generating ground truths in medical imaging [23], and interpreting medical pictographs [67]. In an extensive review of crowdsourcing meth- ods and gamification, Morschheuser et al. determined that crowdsourcing games like Foldit lead to richer datasets, enhanced user participation, and more diverse creative contributions when solving tasks [47]. Modern extended reality systems have come a long way technically in enhancing user immersion through widening the field of view, increasing frame-rate, leveraging low latency motion capture, and providing realistic surround sound [42]. As a result, we see mass adoption of commercial XR HMDs such as the Magic Leap One, Microsoft Hololens, HTC Vive, Oculus Quest, PlayStation VR, and others with a projected to reach 30 million sales per year by 2023 [42]. These mediums are becoming more mobile and intrinsic to the average consumer’s entertainment experience, enabling a mode for their remote participation in 3D annotation tasks [10]. Addi- tionally, game engines such as Unity3D are now deployed for visualization and interaction in industries such as construction, entertainment, government, and healthcare, now operating across multiple operating and soft- ware systems [61]. Research scientists have reported that the use of a flexible and interactive game engine such as Unity is powerful in tackling biomolecular visualizations challenges [44]. As a result, there may be potential for leveraging the immersive capabilities of extended reality to expedite the data visualization and analysis process for complex biological data. Since our goal is to design an immersive system that is usable by non-experts, we apply presence and usability theory to understand whether mixed reality can accelerate biomedical research analysis. Researchers have sug- gested that human perception is heavily influenced by the presence of a given task when utilizing XR systems [5, 16]. Presence within the virtual environment, often defined as a “sense of being there,” has been linked to the engagement and flow of user experience [ 59]. Past case studies comparing XR headsets to desktops and room- scale systems have found that HMD style systems are advantageous in improving user performance for a variety of tasks [13, 22, 33]. Our design process was driven by experts, and our first evaluation involved expert audits of our working HMD experience prototype. Following this, we examined whether non-experts could accurately and efficiently annotate complex medical data while carrying out representative analytic tasks. We collected multiple quantitative task performance measures while also utilizing two standard surveys: Brooke’s System Usability Survey (SUS) [7] and the Slater-Usoh-Steed Presence Survey [62]. Brooke’s SUS survey is a widely deployed, well-validated measure that provides a robust view of a user’s subjective rating of system usability [4, 7, 55]. The Slater-Usoh-Steed presence survey measures the sense of being in an environment, which is often linked to the engagement and flow of the user [ 59, 60, 62]. Since our approach centered around an immersive design, this second measure allowed us to assess the role of immersive engagement in contributing to the application’s successful operation. 1.1 Study Goals and Contribution Given the limitations of current 2D tools, enabling exploration and annotation of complex data in the 3D vir- tual world may facilitate smarter and faster machine learning models that expedite diagnosis and discovery processes. While prior work has begun to explore extended reality for biomedical visualization, there has been ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:4 • A. Elor et al. less focus on designs that (1) enable user annotation and active data manipulation beyond visual trans- formation, (2) leverage the immersive capabilities of spatial computing and augmented reality, and (3) support cross-platform capabilities between different extended reality mediums. Given the success of other crowdsourcing approaches, these immersive experiences potentially support non-expert users in explor- ing and accurately annotating complex medical data, both through the unique perspective they can provide and through using the input modalities inherent to extended reality devices. Augmented reality, or the extension of virtual reality into the physical world through spatial tracking and occlusive virtual placement, also has the po- tential to integrate into the lab workflow. Enabling cross-platform communication between extended reality and more generally deployed tools such as WebGL and MATLAB increases the potential for collaborative research analysis and data review [46, 50]. This paper evaluates a mixed reality experience with expert informed design to address the following goals: • To build a novel 3D system that allows flexible 3D exploration of medical data, allowing users to scale and rotate models as well as interpolate themselves into the model, allowing them to create accurate annota- tions efficiently. • To examine the usability of such a system by users who are not medical experts. Therefore, we leverage spatial computing and mixed reality for interactive biological visualization, explicitly using the Magic Leap One (MLO) and the Unity Game Engine. With emerging devices such as MLO, new input modalities may facilitate more intuitive interaction and higher perceptive viewing of data. We aim to disseminate our exploration of this system to inform future mixed reality visualization tools for the design and evaluation of utilizing these systems for biomedical analysis. 2 SYSTEM DESIGN Current annotation of micro-scoping imaging often involves using 2D displays in manually annotating thousands of image segments to build machine learning models towards edge detection, classifying regions of interest, and utilizing automation tools towards diagnostics. In approaching our system design, we sought to explore how immersive 3D interaction could expedite this process with the annotation of thousands of images at a time through 3D observation and placement. We developed this system at the United States National Institutes of Health (NIH) - National Center for Advancing Translational Sciences (NCATS). Our design involved an interdisciplinary collaboration of health researchers with expertise in immersive media, human-computer interaction, bioengineering, bioinfor- matics, medical imaging, and industrial engineering. The system was prototyped over three months on-site, with weekly feedback meetings between the NCATS data science, automation, and bio-engineering groups, to ensure that the system supported well-defined medical analytic tasks. Our user-centered design process consisted of task analysis, expert guideline-based evaluation, user-centered evaluation, and comparative evaluation to in- form of our study goals [25]. We began with expert interviews to drive initial system designs and user interface requirements which was followed by weekly prototype design and feedback sessions with NCATS specialists. Expert interviews were also conducted to develop representative usage scenarios for evaluation, as well as to offer independent system evaluations of our working prototype [ 25]. We now describe our design choices. Our system was named “BioLumin,” with a two-part meaning: (1) the MLO interaction is centered around Magic Leap Inc. Lumin Operating System (derived from luminesce) [34], and (2) the production and emission of light by the biological organism reconstructed in the virtual world (derived from bio-luminescence). The MLO system, a primary driver of mixed reality interaction with BioLumin, is a “spatial computing” headset that overlays augmented reality while performing simultaneous localization and mapping on the physical world [34]. MLO’s augmented environment was chosen for two reasons: first, it allowed practical HCI testing of our proposed application, and second when working in a biomedical lab environment, seeing the physical world ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:5 around the user is critical for safety, precluding the use of fully immersive approaches. At the time of the study, the untethered headset differed from other commercially available XR HMDs by projecting light directly into the user’s eyes while also enabling higher input modalities through hand tracking, eye tracking, dynamic sound fields, and 6- Degree of Freedom (DoF) controllers with haptic feedback [34]. To enable the visualization and interaction with the virtual world, the Unity Game Engine was chosen as the primary driver to run BioLumin. Unity is a flexible real-time 3D development platform that enables the creation, operation, and rapid prototyping of interactive virtual content [61]. Unity was chosen for its flexible capabilities, which enables the application to build the same experience between multiple operating systems such as WebGL, Magic Leap, HTC Vive, Oculus Rift, Windows, Mac, and more [61]. Consequently, BioLumin was developed in Unity 2019.1.5f1 on two separate build instances: Lumin (MLO SDK 0.21) and WebGL (OpenGL 4.5). 2.1 Pipeline We adopt our imaging protocol from Boutin et al.’s work on high-throughput imaging and nuclear segmentation for 3D models [6]. Our work expands upon this protocol through a custom pipeline to transfer raw imaging data from MATLAB into the Unity environment and is described in the following steps: (1) Imaging: capturing a series of z-stack microscopic imaging data. (2) Analysis: the segmented images are filtered, removing small debris and determining edges between points of interest. (a) the implementation of edge detection with computer vision and machine learning. (b) Small debris removal from user manually annotating images. (3) Reconstruction: the filtered data is voxelized, converted to binary image matrices, and volumetrically reconstructed in 3D. (4) Exportation: The volumetric reconstruction is converted to geometric faces and vertices with variable decimation, which then converts to a custom Surface Tessellation Language (.stl) file format. (a) Solids are split up by edge detection from the previous analysis. (b) The .stl file is exported from MATLAB to Unity through a custom C# importer. (5) Exploration and Annotation: the microscopic surface model is represented as a Unity game object. (a) The object can be manipulated and visualized through MLO. (b) Annotation can be placed on the object where the user can manipulate the position, size, 3D scaling, and rotation. After placement confirmation, annotation data is saved via JavaScript Object Notation (.json) format through a universal coordinate system relative to the unit scale with the model’s transform meta- data. Annotations are visualized by color placement where tag naming and metadata can be expanded upon post-hoc in Matlab or WebGL. (c) For the purpose of our user study, we constrain BioLumin’s annotation to 3D spherical volume markers that are manually placed with the MLO controller. (6) Closing the loop: The annotation data is sent back to multiple platforms and tools. (a) Annotations can be sent back into MATLAB and converted to raw image files to highlight regions of interest. Such data can help speed up the small debris removal phase by enabling 3D interaction through MLO with annotations. Machine Learning models may have an accurate ground truth from this use upon filtering debris in the future. (b) Annotations are sent into a custom WebGL instance of BioLumin over the world wide web. Researchers can collaboratively view surface models as well as edit annotations. This pipeline was designed with flexibility in mind and with application to other forms of biological imaging. More specifically, we expand upon Boutin et al.’s work through building the exportation and exploration tools for the Unity Game Engine experience (protocol steps 4–6). Figure 1 represents the BioLumin pipeline used with cell nuclei co-focal imaging datum from Boutin et al. [6]. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:6 • A. Elor et al. Fig. 1. BioLumin pipeline for mixed reality biological visualization and manipulation. 2.2 Interaction To enable interaction with the virtual world, the MLO Controller and gesture API is leveraged with design consid- erations from Magic Leap’s recommended guidelines [34]. The user can place, translate, scale, and rotate surface models with and without the MLO controller. These operations were identified through iterative design testing with multiple NIH/NCATS experts over three months. Experts met with us weekly to provide feedback on the Bi- oLumin prototypes and tested every input interaction to assure that these gestures felt comfortable and intuitive. Specifically, the control functions that allow the user to manipulate microscopic data in magic leap are as follows: • Place: By holding down the trigger, the surface model is mapped to the position and rotation of the con- troller. Releasing the trigger places the model in 3D space. • Translate: By pressing up or down on the touchpad, the user can pull or push the model in the forward direction of the controller. • Scale: By pressing left or right on the touchpad, the user can scale the model’s size up or down. • Rotate: By performing a radial scroll on the touchpad, the user can rotate the model’s yaw in the direction of scrolling. • Annotate: The user reveals and places a 3D annotation sphere by holding down the bumper on the touch- pad. Annotations appear on the tip of the controller and can be confirmed by pressing the center of the touchpad. Conversely, annotations can be deleted by pointing the controller through an existing annota- tion and holding the center of the touchpad. These input methods, along with the alternative controls for WebGL and hand gestures, can be seen in Figure 2. It should be noted that the hand gestures were prototyped for hands-free usage within the lab environment. Still, experts emphasized the desire to use the controller because they preferred the haptic feedback and tactile input ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:7 Fig. 2. BioLumin Interaction and Control Methods. provided by the controller. As a result, every input method had unique haptic feedback patterns mapped to the controller, and annotations had both haptic and audio feedback for confirmation, deletion, and highlighting. These features are shown in Figure 2. The WebGL instance enabled viewing annotations without MLO using a web browser, and MLO enabled 6-DoF annotation placement and stereo viewing of the surface model. 2.3 Model Prototypes We explored two test cases with BioLumin: (1) 3D cell nuclei, and (2) vascularized tissue with progressive tumor growth. This data was collected from Boutin et al.’s NCATS study on high-throughput imaging and nuclear segmentation analysis for cleared 3D culture models [6]. The 3D cell nuclei provided a test case for processing and representing hundreds of biological solids, while the vascularized tissue provided a test case for high-resolution processing imaging. These test cases are often considered fundamental in the process of understanding biological models for drug discovery [24, 36, 37]. The two data sets were run through the BioLumin pipeline and successfully visualized through both MLO and WebGL Unity instances. These visualizations are shown in Figure 3. With system prototype defined, we proceeded to obtain general qualitative feedback from our domain experts to inform how to best evaluate the system. These findings informed our user study, in which we gather system information on usability, learnability, and task performance of non-expert users. 3METHODS This study received an Institutional Review Board (IRB) exemption from the University of California Santa Cruz (UCSC) Office of Research Compliance Administration (ORCA) under protocol #HS3573. We ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:8 • A. Elor et al. Fig. 3. BioLumin Reconstructed Surface Models from Opera Phenix Microscope for (1) Cell Nuclei and (2) Omentum Vascu- larized Tissue with Tumor Growth using data from NIH NCATS [12, 45]. additionally note that no compensation was provided to the participants who volunteered to engage with the study. 3.1 Expert Feedback After our initial design and primary developer assessments of BioLumin, we were left with many questions. With commercial adoption becoming highly prevalent for immersive XR devices, could people use Biolumin in its current state? Secondly, as described in the related work section, many published prototypes lack user evaluation or do not use augmented reality systems such as MLO. We aimed to evaluate BioLumin and understand its usability and impact with these considerations in mind. We employed a mixed-method approach to evaluate BioLumin, beginning with expert interviews. These were followed by non-expert assessment, involving task-based evaluation, system log-file analysis, and post- experiment questionnaires. We aimed to answer if BioLumin can help untrained users replicate the accuracy of NIH experts in data annotation, if Magic Leap will induce necessary levels of presence for task-based perfor- mance, and if BioLumin has any critical design flaws. 3.2 Expert Interviews We began by interviewing three experts working with microscopic 3D surface models at the NIH NCATS. These experts had a combined 70 years of experience with diverse roles, including the director of bio-engineering, a bio-engineering research scientist, and an advanced imaging specialist. None had been involved in prior discus- sions about the system. A 45-minute semi-structured interview was performed that queried experts on previous experience with tools for cell imaging, annotation, and their perspectives on current surface model analysis limitations. Finally, they were given a demo of the BioLumin experience, followed by a short questionnaire to evaluate the BioLumin features. After this, they were asked to suggest representative tasks for assessing the system. Experts made the following observations: (1) Microscopic 3D biomedical visualization is a relatively new approach lacking baselines. 3D biomedical anal- ysis is a critical method for drug and disease evaluation/discovery, but new methods are needed. There are no current standards for software analysis, and the most frequently used tools require proprietary ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:9 microscopes. This requirement for proprietary equipment restricts end-user access and reduces the possi- bility of using crowdsourcing methods to annotate such data at scale. (2) Manual feature extraction and processing of these images are vastly time-consuming. For the Omentum model, this process can take months. While computer vision and machine learning algorithms are being developed to process this data, creating the ground truth data and human-in-the-loop processing are time- consuming and critical. (3) A consensus between these users in evaluating BioLumin was that feature annotation may be faster and that viewing and manipulating such biomedical data with BioLumin is far more intuitive, easier, and accu- rate than established techniques. Other expert observations suggested potential benefits for non-expert user detection of complex spatial rela- tions in biological data, overcoming some of the problems with current 2D approaches: (1) “This [BioLumin] would be an amazing way of knowing what talks to what in a tissue. And the spatial arrangements and for some tissues, that is very important.” (2) “Right away, you see what the big picture is in a tissue [with magic leap]. And I think that was pretty powerful to help in thinking about what I should be looking at. You could tell that some [tumors] were on the vasculature, and some were not. This can take a lot of time in 2D, but in 3D, this comes right away... in a flat-screen, this is oftentimes difficult to appreciate...” (3) “The hope is to use Biolumin as well to follow the 3D Neuride. This is difficult to do on a flat 2D screen like rotating and identifying regions of interest.” These experts concurred that BioLumin could be a powerful tool in augmenting their current processes for annotating and viewing these microscopic images in their biomedical models. Aside from commenting that the 3D visualization provides enhanced perception and viewing capabilities compared to their desktop monitors, ex- perts indicated that annotation has the potential to massively speed up their research: as it enables the annotation of thousands of images at a time through 3D observation and placement. Finally, we asked them to recommend representative tasks for user evaluation trials. Experts recommended identifying regions of interest in the Omentum model to identify where blood vessels and tumor growth overlap – a process that often takes NIH NCATS researchers months when manually processing images. This process is easily quantifiable in a 3D environment, allowing straightforward success metrics, as annotations can be analyzed based on mesh geometry between the two models generated by experts. While these expert reviews were positive, we have already noted the costs associated with expert annotation. We, therefore, went on to evaluate the potential of Biolumin for non-expert crowdsourced annotations to assess whether they could use the system effectively. We asked the following questions: Could BioLumin be used for effective annotation by non-experts who have never seen medical data or worked at NIH? Would these non- experts be able to use the control functions of BioLumin to annotate images accurately and efficiently? 3.3 Non-Expert User Testing To understand how non-experts would perform with BioLumin, we designed a multi-stage experimental protocol to evaluate BioLumin for non-experts. To recruit our non-experts, engineering students from the University of California Santa Cruz (UCSC) were recruited through virtual flyers where they were invited to evaluate a Unity Experience involving Magic Leap. 3.3.1 Experimental Protocol. We wanted to assess whether such non-expert users would be able to quickly master BioLumin’s functionality to create high-accuracy ground truth annotations of the Omentum model’s blood vessels and tumor growth overlap. We, therefore, instrumented the system by implementing a log-file Unity3D system for 90 Hz runtime data. This allowed us to collect rich data about task and usage data, including overall task completion time, individual annotation completion/count/errors, motion capture, and eye-tracking ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:10 • A. Elor et al. Fig. 4. BioLumin User Testing Protocol and Data Handling. position. The Microsoft. NET I/O Framework was utilized to record CSV file datum from Magic Leap. To better understand the underlying operation of Biolumin, we also assessed usability and immersion. We tested the us- ability of the system using validated survey methods: (1) the Usoh-Slater-Steed Presence Survey [62]and (2)the John Brooke System Usability Scale [7]. Finally, we administered presence surveys as related work argues that high levels of presence induce greater engagement with the system and task success [16, 17, 31, 49]. The experimental protocol consisted of four tasks, followed by a final set of surveys. The order can be seen in Figure 4 and is described in detail below: (1) Preparation: A research evaluator prepared the Magic Leap One Headset with sanitation wipes and linked the lightbox to a laptop running the MLO Lumin Package Manager. The Unity Editor instance of BioLumin was loaded, and each evaluator ensured that no communication errors or visualization errors occurred in linking the headset to the Unity instance. The research evaluator then set up a webcam recording through OBS studio and began the experimental protocol. (2) Training Tasks 1-3: The research evaluator introduced users to the BioLumin system through three intro- ductory tasks. Users were told that the purpose of the experiment was to assess usability and gauge their evaluations of the experience. It was also explained that they would be working with actual medical data from NIH to help annotate regions of interest for experts. Users were then given the tutorial tasks and en- couraged to ask questions of the research evaluator. Again users had to successfully complete these tasks before moving on to the evaluation task. These training tasks were as follows: • Task 1: Introductory tutorial with a DNA surface model (about 2 minutes). The research evaluator intro- duced BioLumin, demonstrating the place, translate, scale, rotate, and annotation functions on a DNA 3D model. The evaluator then handed the user the controller, instructed them to perform every control function at least twice and ensured that all users could use it correctly through the next two tasks. User’s competence with the functions was assessed by having them place three annotations at the center and ends of the model. They then deleted these. • Task 2: Annotation Task with 3D Cube surface model (about 3 minutes). The research evaluator in- structed the user to annotate all eight corners of a 3D virtual cube in Magic Leap. After this task, users were instructed to keep precision and error in mind as they would be working with medical data. Small transparent guidance spheres were depicted on the cube to visually show the user regions of interest for annotation. Every user was able to annotate all corners with zero errors successfully. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:11 Fig. 5. BioLumin Experimental Tasks. Task 1: Research evaluator visually instructs user on all BioLumin control functions and confirms user interacts with each. Task 2: Annotate the eight corners of a virtual cube. Task 3: Annotate a pre-labeled subsection of the Omentum Blood Vessel and Tumor Tissue Overlap. Task 4: Independent Annotation of a large region of Omentum Tissue with no instruction or guidance from the research evaluator. • Task 3: Introductory tutorial with cropped Omentum tissue surface model (about 5 minutes). The re- search evaluator described the Omentum tissue blood vessels and tumor growth with the user’s goal of annotating the overlapping regions with high precision (ensuring the annotation sphere is not misplaced on the blood vessels or tumor growth exclusively) to reduce zero false positives. Transparent indicator spheres were placed on the overlap to visually show the user’s region of interest. These also included corner cases where the blood vessels and tumor growth were barely touching and or slightly cropped in the virtual model. Users were reminded to keep depth in mind when annotating the cropped model. All users were able to complete this task successfully. (3) Evaluation Task 4: This involved annotating Omentum tissue (about 15 minutes). Here users were in- structed that this was the primary evaluation and that research evaluators would not answer system ques- tions during this task. Users were required to annotate an entire region of Omentum tissue and were not given any instruction until the 10-minute mark. At 10 minutes, users were instructed about their annota- tion percentage and were told they could quit at any time. This option was offered to assess participants’ engagement with this system. (4) Surveys: After the experiment, users were given survey questionnaires about system usability, presence, and open-ended questions about preference and BioLumin use. We employed the SUS [7] and Presence [62] which are widely deployed and well-validated surveys. Trials lasted approximately 30-45 minutes. Usage, video, and survey data recordings were stored, as shown in Figure 4. Individual evaluation tasks can be seen in Figure 5. As described above, tasks were designed in consultation with collaborating scientists, and their representativeness was confirmed with experts. These tasks required a diverse set of simple and complex controls to learn and evaluate usability throughout the trials. 3.4 Measures Our experimental protocol enabled us to collect a variety of measures. The first three tasks were tutorials to familiarize users with BioLumin over the course of 10 minutes. In contrast, the fourth task allowed us to collect test data about analyzing various user behaviors and success in annotating the Omentum tissue. We identified five dependent variables that serve to evaluate quantitative and qualitative aspects of our prototype’s performance: user engagement, annotation completion, annotation efficacy, annotation redundancies, and usability score. We aim to predict these dependent variables in subsequent sections by instrumenting the system to collect different user behaviors with BioLumin. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:12 • A. Elor et al. 3.4.1 DV1 - User Engagement. User engagement is a critical metric for mixed-reality systems, and we collected objective data about this by assessing users’ willingness to use the system voluntarily. We evaluated whether users would continue with the system when discretionary to do so. Recall that after ten minutes of the final annotation task, users were told that they could end the task at their discretion. User engagement was assessed by logging the time each user continued with annotation beyond the ten-minute period. 3.4.2 DV2 & DV3 - Annotation Completion and Redundancies. We also assessed accuracy by measuring the completion and redundancy of annotations. Users were required to place 3DUI annotations on all overlapping blood vessels and tumor growth points in the Omentum model. Spherical colliders were placed in the Unity scene on regions of the Omentum tissue where the blood vessel mesh overlapped with the tumor growth mesh to calculate the percentage of task completion. These colliders were used as indices to track each annotation sphere the user placed to assess both total successful annotation completion (e.g., how many annotations overlapped with the spherical colliders) and also annotation redundancies (e.g., how many annotations did not overlap with the tumor-blood vessel meshes or were placed on a vessel that was already annotated). 3.4.3 DV4 - Annotation Efficacy. Given that users had varying engagement times (DV1), to measure efficacy, we normalized the rate of annotation completion to allow for the total time spent using the application. This measure indicated how strategically users could spend their time annotating the regions of interest during the Omentum annotation task. 3.4.4 DV5 - Usability. At the end of task 4, users completed the System Usability Scale (SUS) [7]. The SUS survey is a ten-item Likert scale questionnaire that measures a user’s perception of system complexity, ease of use, and confidence in using a system. The SUS scale has been determined to be a highly robust, reliable, and versatile survey for measuring system usability [4]. 3.4.5 Independent Variables to Predict Success and Usability Metrics. We wanted to examine what usage factors predicted our five dependent variables, so we also collected data about various user behaviors using BioLumin’s system logfile collection. This quantitative data includes the controller pose (position and rotation), headset pose, Omentum model pose, total eye tracking movement, and the BioLumin control function usages (total time usages of model placement, translation, rotation, and scale). This logfile data was supplemented with qualitative data collected through video recordings and post- experimental surveys. Given the importance of immersion in mixed-reality designs, we measured the user’s perceived presence when using the environment with the Usoh-Slater-Steed Presence Questionnaire, a validated survey previously used to assess the user’s sense of ‘being there’ for different immersive virtual environments [62]. In addition, we aimed to better understand usage strategies by systematically analyzing different user system behaviors. 3.4.6 Coding of Behavioral Data. We also examined and systematically coded video data of user behaviors collected during task 4. Two researchers examined different behavioral strategies for how users approach the annotation task. We identified different strategies for model viewing and controller holding, and determined potential causes of error by reviewing the video data. To ensure high inter-rater reliability, we developed codes for each behavioral strategy through two pilot rounds of analysis on user behaviors on the initial three tutorial tasks. Coders resolved disagreements through verbal discussion and revised the codebook on each round. This initial analysis defined the codebook, so the researchers next created a Python script to expedite coder analysis by displaying five-second clips with three exclusive behavioral codes for both user viewing strategies and controller holding behaviors (as shown in Figure 7). The codes were then used as independent variables to model how viewing and holding strategies and other variables influence user performance and system usability. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:13 Fig. 6. Omentum Annotation Task Performance for Time (left), Annotation Completion (middle), and Error Rate (right). 4RESULTS We tested with 19 university students (7 female, 11 male, 1 non-binary), using the experimental protocol shown in Figure 4, testing user performance on a final annotation task following three prior training tasks that lasted a total of around 10 minutes. These users had a mean age of 23 with a standard deviation of 2.5 years, and six were tested at an office, and 13 were tested in a lab environment. A screening survey indicated that 9 of these users had previous experience with extended reality systems, and 10 had little to no experience. Screening questions established that none of these users reported prior experience analyzing medical data or using Magic Leap One for similar tasks. Overall performance on the final annotation task was good, showing learnability, high task completion, and few errors. Users completed a mean of 98% subtasks in about 13 minutes, with an error rate of 0.4%, after relatively brief training experiences. The distributions for annotation time, completion, and error of the Omentum model are shown in Figure 6. Performance quality was calculated by placing spherical spatial markers of 0.001 scale of the Omentum model in the overlapping geometry of the blood vessel and tumor growth game object geometry. Completion and error were computed automatically using colliders with annotation markers to count correct and incorrect annotation placements with the 0.001 scale precision. We first describe the observational data before analyzing which aspects of system usage promoted optimal performance. 4.1 Behavioral Strategies for System Usage As discussed in the previous section, two researchers had coded user behaviors concerning observation (how users viewed the model), control (how users held the controller), as well as causes of errors (e.g., the user clipped the model by moving their head within the range of MLO’s clip plane). Coding was reliable; recurrent patterns were seen between users revealing high agreement as shown in Table 1. We now describe these strategies and how they were defined: • Observation strategies when viewing the Omentum Model during annotation: – Looking down (determined from behavioral coding): users would place the Omentum model below them on the floor and walk over the data with occasional annotation. – Looking forward (determined from behavioral coding): users would place the model directly ahead of them and reach into the data to place annotations or rotate the Omentum tissue. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:14 • A. Elor et al. Table 1. Observational Coding by Two Researchers of User Viewing and Control Behaviors in BioLumin Showing Inter-reliability for 1140 Cases Showing Percent Agreement [29] and Cohen’s Kappa [8] Coded Behavior from BioLumin Percent Agreement Cohen’s Kappa N Agreements N Disagreements Glance - Looking Down 94.82% 0.801 1081 59 Glance - Looking Forward 93.33% 0.808 1064 76 Glance - Leaning Around 97.63% 0.806 1113 27 Holding - Unsupported Single Hand 97.72% 0.954 1114 6 Holding - Double Hand 99.47% 0.987 1134 6 Holding - Supported Single Hand 97.89% 0.983 1116 24 Errors - Model Skewed from Clip Plane 96.40% 0.811 1099 41 Cases were determined from 25% of session data where behaviors were labeled in five second windows of video data from task 4 (an example of video data can be seen in Figure 5). All coded behaviors had very strong agreement between raters, with high inter-rater reliability [8]. – Leaning around (determined from behavioral coding): users would lean their body or rotate their head around the model rather than using the rotate function. Often these users would pause mid-rotation to place or delete an annotation. • Strategies for holding the controller during annotation of the Omentum Model: – Unsupported Single Hand (determined from behavioral coding): users would hold the controller in their dominant arm without support, often maximally stretching their arm or pointing into the model to place annotations. – Double Hand (determined from behavioral coding): users would hold the controller with both hands, placing annotations at the mid-extent of their reach. – Supported Single Hand (determined from behavioral coding): users would hold the controller with their dominant arm and support their elbow or forearm with their non-dominant arm. Users varied between mid- and far-extent of their reach during model annotation. • Causes of annotation errors: – Model transparent from clipping plane (determined from behavioral coding) - the user’s head was within 0.37m meters of the digital model, and the model was clipped. This occurred when users poked their head through the model rather than using the scale function to zoom in and out. – Depth perception difficulties (determined from survey and unity developer) - Some users had trouble determining the depth of annotations. These users did not look around the model or use the annotation sphere to check for occlusion with the tissue. These users also requested enhanced features that project annotations to the nearest geometry. – Lack of precision (determined from survey and unity developer) - These users approached the annota- tion region using an under-magnified model. As a result, annotation spheres appeared large and often overlapped with tumor-free blood vessels. • Main causes of missing annotations: – Barely overlapping tumor/vessel tissue (determined from survey and unity developer): users would often miss cases where the Omentum blood vessels barely overlapped with tumor growth. These cases required users to scale the model highly to review blood vessel paths from a close view. – Obvious overlaps after 95% completion (determined from survey and unity developer): users would of- ten miss some annotation regions in plain sight after they already reviewed the missing region of tu- mor/vessel overlap large data. This occurred after 95% annotation completion and was usually corrected with a second pass review of the blood vessel and tumor growth overlap. • Outliers wear (or forget to wear) glasses (determined from survey): – These users had the lowest annotation competition with the Omentum model (as shown by P18 and P19 in Figure 9). ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:15 – Attempting to compensate by zooming in or walking closer to the model, but then had trouble with the clipping plane. These observations are illustrated in Figure 7 with behavioral coding reliability shown in Table 1. This behavioral analysis shows very different usage strategies and the next subsection examines the effects of these strategies on performance, engagement, and usability measures. 4.2 Mixed Model Results Our statistical models examined how user demographics combined with BioLumin features influence five success metrics: engagement, completion, efficacy, redundancies, and usability. This subsection reports the significant values through linear mixed model analysis with users as a random factor. Estimation of parameters was achieved using the Maximum Likelihood method using the Satterthwaite approximation for degrees of freedom [63]. We conducted five separate mixed models, with each success metric as a dependent variable. The covariates for each mixed model are the following: • User reported presence score, age, and testing location, obtained from surveys. • Total operation time recorded during Omentum annotation for each control function of place, rotate, trans- late, and scale, obtained from system logs. • Total user movement recorded during Omentum annotation for the headset, the controller, the Omentum model, and the user’s eye fixation, obtained from system logs. • Mean annotation precision (average model scale where markers represent 1% of the Omentum tissue’s volume as a sphere). We utilize these mixed model parameters to explore measures relating to demographics that are influential in utilizing head-mounted displays (e.g., age, glasses-wearing), perceptual measures relating to experience (e.g., presence, usability), and functional measures (e.g., movement, control utilization) towards developing an under- standing of the effects of BioLumin. Consequently, we found multiple significant predictors at 95% confidence of our dependent variables. The following subsections describe the models for each of these dependent variables. 4.2.1 Engagement. As stated above, we measured user engagement by recording how much additional discre- tionary time users were willing to spend in maximizing their annotation completion performance. Our mixed model analysis found seven significant parameters contributing to this assessment of user engagement. People who were older (M = 23.11 yrs, SD = 2.514, F = 51.272, p =< 0.001) and those who wore eye glasses (3 (1,19) out of 19 users, F = 16.500, p =< 0.001) showed significantly less engagement. Those who used the translate (1,19) function also had significantly less engagement (M = 2.08 s SD = 3.439) F = 37.641, p =< 0.001. In contrast, (1,19) users with prior VR experience showed more engagement (7 out of 19 users, F = 10.068, p =< 0.01) and (1,19) greater usage of the rotate function was also a significant predictor of higher engagement (M = 8.791 s, SD = 11.529, F = 51.134, p =< 0.001). We also found that greater total change in distance of head (M = 30.22 m, (1,19) SD = 13.261, F = 41.442, p =< 0.001), hand (M = 52.13 m SD = 33.148, F = 5.799, p =< 0.05), and eye (1,19) (1,19) fixation movements (M = 428.33 m, SD = 470.921, F = 21.788, p =< 0.001) were associated with significantly (1,19) greater user engagement. These movements were recorded by tracking the total aggregated distance between the HMD, controller, and eye fixation positions through the 10+ minute testing session of the final evaluation task. Additionally, users who held the controller with a single hand (in contrast to the double handed grip or single hand supported elbow grip) showed significantly more engagement (M = 30.16 SD = 28.568, p =< 0.05). These findings are shown in Table 2. 4.2.2 Completion. As described above, annotation completion was determined by setting spherical colliders, which enabled us to measure how much of the regions of interest (the blood vessel and tumor overlap) were successfully identified by each user from 0 to 100%. A second mixed model analysis found significant predictors ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:16 • A. Elor et al. Fig. 7. Examples of different user viewing methods (in green), controller handling (in yellow), and error propagation (in red). All observations are mutually exclusive. Table 2. Significant Effects of Engagement Time Mixed Model Results of Engagement Time 95% Confidence Interval Parameter Estimate Std. Error df t Sig Lower Bound Upper Bound Intercept 1007.085 173.315 19 5.811 <0.001 644.331 1369.839 AGE −36.471 5.093 19 −7.160 <0.001 −47.132 −25.810 HAS VR EXPERIENCE 15.944 5.025 19 3.173 0.005 5.426 26.461 WEARS GLASSES 202.852 46.414 19 4.370 <0.001 105.705 300.000 ROTATE TIME 10.705 1.497 19 7.151 <0.001 7.571 13.838 TRANSLATE TIME −35.373 5.765 19 −6.135 <0.001 −47.441 −23.306 HEAD MOVEMENT 11.661 1.811 19 6/436 <0.001 7.869 15.454 HAND MOVEMENT 1.522 0.632 19 2.408 0.026 0.199 2.846 EYE FIXATION 0.168 0.035 19 4.668 <0.001 0.090 0.236 UNSUPPORTED SINGLE HAND HOLD 4.076 1.606 19 2.537 0.020 0.713 7.439 ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:17 Table 3. Significant Effects of Annotation Completion Mixed Model Results of Annotation Completion 95% Confidence Interval Parameter Estimate Std. Error df t Sig Lower Bound Upper Bound Intercept 122.925 18.027 19 6.819 <0.001 85.193 160.657 WEARS GLASSES −14.582 5.057 19 −2.883 0.010 −25.168 −3.995 TRANSLATE TIME −2.023 0.645 19 −3.134 0.005 −3.375 −0.672 SINGLE HANDED HOLD 0.136 0.061 19 2.251 0.036 0.009 0.0264 DOUBLE HANDED HOLD 0.168 0.064 19 2.601 0.018 0.032 0.303 CLIP PLANE ON MODEL −2.284 0.198 19 −11.526 <0.001 −2.698 −1.869 Table 4. Significant Effects of Annotation Efficacy Mixed Model Results of Normalized Efficacy 95% Confidence Interval Parameter Estimate Std. Error df t Sig Lower Bound Upper Bound Intercept 0.203 0.056 19 3.631 0.002 0.086 0.321 PRESENCE −0.015 0.005 19 −2.735 0.13 −0.026 −0.003 AGE 0.004 0.001 19 2.485 0.22 0.001 0.007 WEARS GLASSES −0.071 0.014 19 −4.884 <0.001 −0.102 −0.041 ROTATE TIME −0.002 0.001 19 −4.205 <0.001 −0.003 −0.001 TRANSLATE TIME 0.004 0.002 19 2.147 0.045 0.000 0.008 HEAD MOVEMENT −0.002 0.001 19 −2.955 0.008 −0.003 −0.001 EYE FIXATION −0.0004 0.00001 19 −4.033 0.001 −0.00007 −0.00002 CLIP PLANE ON MODEL −0.004 0.001 19 −3.657 0.002 −0.00599 −0.00163 contributing to annotation completion. Users who wore eyeglasses (3 out of 19 users, F = 8.312, p =< 0.05), (1,19) those using the translate function more extensively (M = 2.08 s SD = 3.439, F = 8.820, p =< 0.01), and those (1,19) who clipped the model more (M = 5.89 SD = 6.523, p =< 0.001) showed reduced completion. Users who held the controller with a single unsupported hand (M = 30.16 SD = 28.568, p =< 0.05) or with two hands (M = 17.381 SD = 26.871, p =< 0.05) had significantly more annotation completion. These findings are shown in Table 3 with completion rates per user in Figure 6. Additionally, we observed a significant difference in annotation completion between the 200-second start and ending windows of task 4 as shown in Table 7, with the first 200 seconds resulting in great mean completion. 4.2.3 Efficacy. To examine the parameters contributing to user efficacy, we normalized the annotation com- pletion to allow for the fact that more engaged users have more opportunities to complete their annotations. We created another mixed model to examine how each parameter contributes to annotation completion per unit time. We found seven significant predictors attribution to user annotation Efficacy. Users who reported higher levels of presence (M = 4.466 Usoh-Slater-Steed Scale, SD = 0.968) F = 7.482, (1,19) p =< 0.05), wore eye glasses (3 out of 19 users, F = 23.854, p =< 0.001), heavily used the rotate function (1,19) (M = 8.791 s SD = 11.529, F = 17.683, p =< 0.001), had higher head movement (M = 30.22 m SD = 13.261, (1,19) F = 8.735, p =< 0.01), had greater eye fixation (M = 428.33 m SD = 470.921 F = 16.267, p =< 0.001), or (1,19) (1,19) clipped the model more (M = 5.89 SD = 6.523, p =< 0.01) showed significantly reduced efficacy. Those who were older (M = 23.11 yrs SD = 2.514, F = 6.173, p =< 0.05) or used the translate function (M = 2.08 s SD = 3.439, (1,19) F = 4.608, p =< 0.05) had significantly higher efficacy. This can be seen in Table 4. (1,19) ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:18 • A. Elor et al. Table 5. Significant Effects of Annotation Redundancies Mixed Model Results of Redundancies 95% Confidence Interval Parameter Estimate Std. Error df t Sig Lower Bound Upper Bound Intercept 52.412 59.518 19 0.881 0.390 72.161 176.986 PRESENCE −13.661 6.116 19 −2.234 0.038 −26.462 −0.860 HAS VR EXPERIENCE 140.107 17.176 19 8.157 <0.001 104.156 176.059 SCALE TIME 2.982 1.029 19 2.896 0.009 0.827 5.137 ANNOTATION PRECISION −22.194 8.642 19 −2.568 0.019 −40.284 −4.104 CLIP PLANE ON MODEL 2.594 1.141 19 2.272 0.035 0.204 4.983 Table 6. Significant Effects of Annotation Usability Mixed Model Results of Usability 95% Confidence Interval Parameter Estimate Std. Error df t Sig Lower Bound Upper Bound Intercept 22.933 22.840 18 1.003 0.328 −24.872 70.739 PRESENCE 10.785 3.176 19 3.396 0.003 4.413 17.433 AGE 2.078 0.953 19 2.179 0.042 0.082 4.073 ROTATE TIME 0.665 0.281 19 2.371 0.028 0.078 1.251 SCALE TIME −1.109 0.425 19 −2.611 0.017 −1.999 −0.220 HEAD MOVEMENT 0.771 0.339 19 2.275 0.035 0.061 1.481 HAND MOVEMENT −0.621 0.118 19 −5.253 <0.001 −0.869 −0.374 4.2.4 Redundancies. While errors (defined as annotations encapsulating no tumor and blood vessel overlap) showed no differences between users, we nevertheless observed many inefficiencies when users made duplicate annotations, identified as annotations placed on tumor and blood vessel overlap that had already been identified by an adjacent annotation. These redundant annotations were found to have three significant predictors through our mixed model analysis. Users who reported higher level of presence (M = 4.466 Usoh scale, SD = 0.968, F = 4.989, p =< 0.05), used (1,19) higher levels of annotation precision (M = 2.17x scale of the 1m voxel Omentum tissue, SD = 1.14, F = 6.594, (1,19) p =< 0.05), or who spent more time clipping the model with their head (M = 5.89 SD = 6.523, p =< 0.05) showed significantly reduced numbers of redundant annotations. Prior VR experience was also a significant predictor of redundancies (7 out of 19 users, F = 66.535, p =< 0.001). In contrast, users who used the scale function more (1,19) (M = 6.91 s, SD = 8.899, F = 8.388, p =< 0.01) had significantly fewer redundant annotations, as shown in (1,19) Table 5. Additionally, we observed an insignificant difference in redundancies between the 200 second start and ending windows of task 4 as shown in Table 7. 4.2.5 Usability. Usability was measured using the System Usability Scale (SUS) [7]. Overall the system was found to be extremely usable with a mean SUS 77.6 score (+/− 15.75 SD) where zero indicates unusable, and 100 indicates highly usable. Examining the different scale items, we found the highest variance (ranging from 1.23 to 1 SD) for system complexity, desired frequency of use, cumbersome to use, and ease of use. We found the lowest variance (from 0.7 to 0.8 SD) for system accessibility/learnability, confidence in using the system, well-integrated controls, and consistency between controls. Our mixed model analysis revealed six significant predictors of usability, as shown in Table 6. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:19 Table 7. T-Tests between T , the Last 200 Seconds of Mandatory Testing and T , the First 200 Seconds of Mandatory Testing Temporal Comparison Between T (400s, 600s ) and T (0s, 200s ) 2 1 MEASURES Mean Std SEM Completion T 18.72 14.606 3.351 Completion T 35.93 18.795 4.312 Redundancies T 3.42 4.004 0.919 Redundancies T 5.16 7.426 1.704 COMPARISON Sig t Mean (SD) T -T 2 1 Completion T vs T 0.019 −2.579 −17.20 (29.084) 2 1 Redundancies T vs T 0.331 −1.000 −1.73 (7.571) 2 1 Analysis of the parametric data suggests significant differences in annotation completion learning rates and insignificant differences in redundancies between the two time-spans. Fig. 8. BioLumin Omentum annotation task performance for Control Function Usage. Users who used the scale function more (M = 6.91 s, SD = 8.899, F = 6.819, p =< 0.05) and or had more (1,19) hand movement (M = 52.128 SD = 33.148, F = 27.591, p =< 0.001) judged the system to have significantly less (1,19) usability. Those who self-reported higher levels of presence (M = 4.466 Usoh Scale, SD = 0.968, F = 11.530, (1,19) p =< 0.01), were older (M = 23.11 yrs, SD = 2.514, F = 4.749, p =< 0.05), used the rotate function more (1,19) (M = 8.791 s, SD = 11.529, F = 5.623, p =< 0.05), and or moved their head more (M = 30.22 m, SD = 13.261, (1,19) F = 5.174, p =< 0.05) responded with significantly higher usability. (1,19) 4.3 Differences in User Performance We analyzed control function usage and found that the place function was by some margin the most highly used manipulation tool with a median of 60 seconds usage compared to the 5-second usages of rotating and translate followed by the 2-second translate usage. These distributions for control function usage can be seen in Figure 8. We found that usage for the place function was the highest, rotate and scale had a similar amount of usage, and translate was the least used control function. We additionally compared individual users by their pre-screening survey responses to see how previous expertise and location induce performance differences. Figure 9 indicates each individual user’s time to complete task 4 vs. annotation completion showed differences in speed, markers placed (N), errors (E), and average model scale or annotation precision (PR). As this figure demonstrates, the ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:20 • A. Elor et al. Fig. 9. BioLumin Omentum annotation task completion for each user with total in the top half and rate of change in bottom half of the figure. Completion percentage indicates amount of successful model annotation for blood vessel & tumor growth overlap. P indicates individual participants. E indicates errors where an error is defined as an annotation placed outside of a blood vessel & tumor growth overlap. N indicates number of annotations placed. PR indicates average model scale during annotation placement, where a higher number corresponds to a higher scale of the model (users were zoomed in to the data more). majority of users performed outstandingly in achieving annotation completion beyond 95% within the 15 minute period. Two users (P18 and P19) were below 95% as they wore glasses, and the headset felt uncomfortable or could not account for their glasses (note that P10 wore glasses as well, but frames were smaller and fit within the headset). This hindered their depth perception greatly, and thus more errors were produced, and annotation was far slower during this task-based trial. 4.4 User Feedback After the presence and usability surveys, users were given open-ended questions addressing the most/least pop- ular system attributes, desired design improvements for BioLumin, and their reactions to using the system in its current state. 4.4.1 What Did Users Like About BioLumin? The most commonly mentioned popular aspects of BioLumin were that it offered 3D manipulation and interactivity with the visualization: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:21 • “VR feeling and high ability to rotate, zoom, position the model.” • “Rotating the data, clipping when approaching the visualization.” • “Ease of zoom by moving my head, color contrast of models.” • “I enjoyed the sound effects and vividly colored data.” • “I liked the interactivity and the visual cues for when I was placing annotations.” These users commented that using the system increased their perspective into medical data and its analysis. Many felt immersed, even claiming that BioLumin is “addicting” and “like a game.” For example, some interesting comments from the user during runtime: • “I am a Cyber Oncologist.” • “I am Captain Picard of the Federation Star Ship.” Additionally, users identified which controls were easy to learn during the 10 minute tutorial period. They felt that movement felt intuitively “more natural” as they were able to walk around the 6-DoF environment and peek their heads through the data. For many, this 3D object manipulation in XR seemed powerful and purposeful beyond the Omentum task: “It’s new to me to manipulate models in a virtual environment, and it’s great to be able to use annotation for academic or educational purposes.” 4.4.2 What Did Users Not Like About BioLumin? The least popular aspects of BioLumin were the controller touchpad and the feel of the headset. Most users stated that the touchpad control felt unresponsive, with many saying that the controller is not “small hand friendly.” Additionally, they felt that confirming annotations with the touchpad felt too intricate, and problems arose with unresponsiveness. For the headset, the system presented challenges for those who wear glasses. Two user outliers in this experiment performed poorly as they felt it was challenging to view through the headset without glasses meaning that they could not visually place tags. For the non-glasses wearers, many said the field of view on the magic leap was too small and made it hard to gauge 3D depth. The magic leap has a 0.37m clipping plane such that if users place the model close to their head, the model clips and partially disappears – users also said this was problematic for gauging depth. Additionally, one user said their eyes felt strained from using Magic Leap and that the headset felt uncomfortably warm. Lastly, some users liked or disliked non-relative place control. These users wanted the place function to be relative to the controller’s location rather than snapping to the position of the controller. Other users liked that the model snapped to the controller and was not relative. This may be due to translate functions of push and pull (effectively enabling a panning ability for the model) having not been sufficiently emphasized, or the touchpad felt too awkward that users prefer a relative place rather than its current state. 4.4.3 Recommended Design Improvements for BioLumin. Multiple users requested the ability to hold and “paint” annotations rather than repetitive tapping on the touchpad to confirm annotation. Additionally, users wanted annotations to be mapped to an analog button rather than a controller. They additionally wanted ad- ditional tools for indicating depth and overlap indication with annotations, such as changing the color of the Omentum model itself to show overlap, hiding annotation markers, or more tools to adjust color, size, and trans- parency. Concerning the control functions, users requested for the place function to have a relative vs. hand- locked mode. Lastly, many requested hardware improvements with the Magic Leap Headset, such as increased field of view, better ergonomic controller design, higher resolution, and a lighter headset. 4.4.4 BioLumin Usage in its Current State. In the final question of the survey, users were asked, “Given BioLu- min’s current state, would you use this system to explore microscopic data and help NIH researchers annotate these complex data sets? Why?” Only one user said that they would not use the system. Instead, the majority said yes, and a breakdown of these responses is as follows: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:22 • A. Elor et al. • Responded with “Yes” (10/19 users): The majority of these users felt the system was fun, satisfying, and a significantly better method than traditional mediums. They stated the following positive system charac- teristics: Fun, Satisfying, “felt like a video game,” “fairly addicting,” “easy to use,” “quick to learn,” “efficient way to annotate.” Another user commented that “while it may be clunky, this system is a great way to vi- sualize microscopy images that is significantly better than looking at 2D images.” Several expressed views indicating their willingness to volunteer to contribute to medical science as crowdworkers, such as: “the thought that I may be helping people with cancer made me even more motivated to play - this may become my new hobby.” • Responded with “Maybe” (8/19 users): The majority of these users requested design improvements in the controller and touchpad interaction. Their views are reflected in the following quotes: “For 3D models, it could be a very useful way to analyze the different intersections of data. However, I would likely use it along with a traditional 3D view.” “I answer ‘maybe’ because I personally feel like I don’t know enough about how to properly explore microscopic data, but I thought it was a cool way to look at it, and I personally liked it and would use it.” “Maybe if I’m bored and feeling virtuous.” • Responded with “‘No” (1/19 users): The single user stated that “in this state, I think the program is too difficult to precise control for use in the exploration of microscopic data.” This user was also one of the outliers that wore glasses. 5 DISCUSSION Immersive virtual environments afford many new opportunities for transforming user experiences by enhanc- ing presence by reducing distractions through the HMD), greater immersion by providing more virtual space & 360-degree viewing, increasing visual multi-dimensional analysis with stereo 3D, enabling more user bandwidth through 6 DoF user input, and providing natural interaction through body-based control gestures for manipu- lation of the virtual world [15]. Moving beyond traditional 2D displays and interaction methods of biomedical analysis with mixed reality has a tangible influence on user experience, efficacy, and engagement. Past studies have also demonstrated that 3D immersive virtual environments can benefit biomedical analysis in understand- ability and efficacy when compared to 2D displays [ 38, 48, 68]. But what happens when we implement mixed reality with a specific focus on microscopy analysis with experts and non-experts? 5.1 Implications for Microscopy Analysis with Mixed Reality Our results suggest that mixed reality can be a powerful tool for interacting with and annotating biomedical mod- els. NIH experts expressed great enthusiasm for this system for addressing bottlenecks in their current workflow. In a recent review of deep learning for medical image analysis, Litjens et al. concluded that the most significant barrier to improving these analytics lies in the lack of large-high-quality training data for such algorithms [41]. And while new annotation tools are being developed, [58], there has been little exploration and evaluation of novel 3D mixed reality tools to support image annotation. Our experts confirmed the system’s promise and po- tential for addressing challenges involved in generating high-quality machine learning data for biological models by crowdsourcing using non-experts to generate annotations. With BioLumin, naive users successfully deploy our novel 6-DoF mixed reality environment. They learned the system in under ten minutes, as indicated by their successful test performance in generating a mean of 98% correct annotations. Most users were also positive about their experience, recommending minimal design improvements, saying they would adopt the system in its current state to help NIH researchers “crowd-annotate” similar data. 5.2 Implications for Designing Mixed Reality Microscopy Analysis Tools Linear mixed model analyses of user behavior showed that designing for embodied interaction with biomedical analysis can yield many benefits compared with traditional computational tools. Past research into embodied ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:23 interaction suggests that whole-body interactive simulation induces significant learning gains, engagement, and positive attitudes toward scientific tasks [ 19, 39, 40]. Bridging medical annotation in immersive virtual environ- ments can capitalize on these many opportunities for experts and non-experts alike. Still, the system designs must actively encourage user movement, exploration, and curiosity when interacting with models. Our results indicate that mixed reality interaction, which encouraged users to walk around, dive into, and physically move their bodies around the data, was a significant predictor of system usability and user engagement. Thus, as design- ers, we need to provide functions such as rotation and scaling to encourage more active exploration. In contrast, the translate function, while efficient in traditional annotation tools with 2D displays, reduced the affordances of walking into and immersing the user in the data. However, we also note trade-offs between exploration and efficacy. Users who spent more time actively exploring models and controls were unsurprisingly less efficient in generating their solutions. Moving forward, future researchers can develop new health-based medical annota- tion tools that incorporate extended reality by building on these insights by designing for exploration, curiosity, and 6 DoF physical interaction. 5.2.1 Designing for User Engagement and Annotation Completion. The mixed model analysis suggests that more exploratory interaction with 3DUI medical annotation is associated with greater user engagement. Greater engagement corresponded to more exploratory user behaviors in the virtual environment: including leaning into, walking around, rotating, visually scanning, and inserting oneself into the model. Greater engagement was also associated with users with prior VR and 3D experience expertise, possibly because they were familiar with utilizing immersive head-mounted displays for navigating virtual environments. Reduced engagement was associated with reduced physical interaction. This seemed more common in users who were older and unfamiliar with immersive environments or were limited by their visual capabilities, as the Magic Leap Headset is not ideally form-fitted for eyeglasses. For future systems that aim to engage users in 3DUI medical annotation, encouraging users to be more active, both through movement and exploratory control functions, may lead to significantly greater interaction and effort in approaching data analysis tasks. In terms of annotation completion, personalization and form-fitting of the mixed reality device are critical. For example, eyeglass users completed fewer annotations. The Magic Leap Headset was not ideally form-fitted for eyeglasses, and thus resulted in reduced engagement, which may contribute to reduced completion. The translate function also reduced completion, which may be because it reduced the physical movement required in the virtual environment by pulling and pushing the Omentum model from the user. These results again indicate that for a more substantial annotation completion, it is better to design towards more exploratory functions that encourage the user to explore the virtual environment actively. 5.2.2 Designing for Annotation Efficacy and Accuracy. From the mixed model analysis, more engaged users tended to demonstrate less overall efficacy. These users had more tracked movements and utilized more ex- ploratory control functions suggesting they were interacting actively with Omentum tissue. In contrast, efficacy was greater for users with less engagement – generally, those who were older and using more movement-based control functions such as translate. These results suggest clear trade-offs between exploration, completion, and efficacy. Control functions that stimulate user movements such as rotate lead to a higher engagement and solu- tion completion but reduce overall efficacy. Furthermore, people with prior VR or 3D application experience were far more “trigger happy” with their an- notations. It appeared that greater engagement through presence influenced a greater number of redundant an- notations. This included utilizing more exploratory functions such as manipulating scale and not using a higher annotation precision. Those with fewer redundancies tended to approach the Omentum annotations with more annotation precision, attentively zooming into the model with a larger average model scale over the session. Moreover, we found that presence was negatively correlated with annotation efficacy and accuracy (more re- dundancies). As past research into presence has been linked to the increased engagement in user experience [5, 16, 59], this may suggest that presence reduced performance as users were engrossed in spending more time ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:24 • A. Elor et al. in the application. Thus, for higher efficacy and accuracy, designers can encourage high-scale annotation and translate functions (model movement independent of the user’s body). Still, a balance must be considered be- tween efficacy-first functions such as translate against exploratory-first functions such as rotate to maintain user engagement and annotation completion. 5.2.3 Designing for System Usability. Interestingly, prior VR experience or wearing glasses were not signifi- cant predictors of system usability. Utilizing the scale function and producing more hand movement resulted in significantly less usability as many users found the magic leap controller to be burdensome. As discussed in the qualitative sections, the controller was often not form-fitting to each user’s hand, making placing annotations a tiring experience. Thus, greater usability was found when engaging users with more exploratory functions. Those who experienced the higher presence and explored the virtual environment more freely reported higher usability. This may suggest that 3DUI XR-based medical annotation may greatly benefit from encouraging users to fully explore their virtual environments by walking around them, leaning in, performing rotations, and ma- nipulating the annotation models during analysis. 5.3 Limitations and Future Work Our evaluation provided insights into the system’s current state and the needs for design improvement, but as with any study, there are many limitations to consider. While the non-expert group had a diverse background in terms of prior experience and location of testing, these users were primarily young adults and students. Our non-experts were tested on a specific annotation task, and the evaluation was relatively brief. Moreover, market-leading industry microscopy analysis tools (e.g., Imaris ) are becoming increasingly more capable for non-immersive 3D visualizations of biological surface models. Due to resource constraints, we decided to take an exploratory approach to BioLumin’s design as many competing software packages are locked behind hefty paywalls and do not provide immersive interaction. In future work, we intend to explore more extended deploy- ments of this technology with a comparative analysis of industry-leading software to examine similar questions and other types of users. While it is always challenging to recruit domain experts, three NIH experts may not be a large enough user base to gauge the impact of BioLumin fully, and more users should be recruited into par- ticipating in following the non-expert experimental protocol. Lastly, Mixed Reality involves meshing the digital world with the real world, and more locations and lab environments should be tested to understand BioLumin’s effects further. 6 CONCLUSION This study designed a proof of concept system demonstrating the viability of a novel mixed reality experience for interactive biomedical visualization and annotation that addresses an essential bottleneck for generating high-quality medical data. An automated pipeline for reconstructing microscopic images into 3D surface models was successfully tested and evaluated with two high-resolution data sets featuring a cluster of cell nuclei and vascularized Omentum tissue. The interactive control functions of this experience and the ability to annotate such data were explored with both experts and non-experts. Experts were optimistic about the potential of this design approach for the field of biomedical research, noting that it has the potential to address research bottlenecks to allow non-experts to annotate thousands of images addressing major bottlenecks for machine learning and computer vision. Non-experts were able to understand and use BioLumin, successfully annotating medical data at the accuracy of expert populations. The majority of non-experts reported interest in using such a system to help NIH researchers. While BioLumin is a definitive proof of concept for the immersive capabilities of a multi-platform mixed real- ity visualization, there is still much work to be done. BioLumin must be adapted to multiple spatial and temporal https://imaris.oxinst.com/. ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:25 big data types extending beyond the surface models explored here. The ability to fully integrate raw data and image sequencing, along with reconstructed models, has immense potential for medical researchers. We aim to perform iterative changes on BioLumin based on data collected throughout this paper to see if usability, presence, and annotation can be improved. We will also develop and deploy more models, recruiting an NIH expert popu- lation compared to non-experts. Furthermore, our BioLumin prototype demonstrates cross-platform capabilities through WebGL and MLO. A server-side instance of BioLumin to enable multi-user runtime interaction could be explored. With server-side capabilities, an instance of WebXR could be implemented to run on other extended reality devices. The direct interface between a server-side instance and the cloud may allow for streamlined pipelines to perform handshakes between data for the extended reality and researcher analysis tool sides. We hope that biological research can be advanced by meshing the analysis between the virtual and physical worlds to co-exist as one. ACKNOWLEDGMENTS We thank Nathan Hotaling, Ty Voss, Marc Ferrer, and Molly Boutin of the National Institutes of Health, the National Center for Advancing Translational Sciences, for their support and participation with the design of BioLumin. We also thank Joseph Adamson and Samantha Conde of the University of California, Santa Cruz for their assistance with the non-expert behavioral video coding analysis done in this study. Most importantly, the authors thank the many student participants that helped make this study possible. REFERENCES [1] 2021. Microscopy Image Analysis Software - Imaris. (2021). https://imaris.oxinst.com/. [2] C. Andronis, A. Sharma, V. Virvilis, S. Deftereos, and A. Persidis. 2011. Literature mining, ontologies and information visualization for drug repurposing. Briefings in Bioinformatics 12, 4 (July 2011), 357–368. https://doi.org/10.1093/bib/bbr005 [3] Joanna J. Arch and Alaina L. Carr. 2017. Using Mechanical Turk for research on cancer survivors: Cancer survivors on Mechanical Turk. Psycho-Oncology 26, 10 (Oct. 2017), 1593–1603. https://doi.org/10.1002/pon.4173 [4] Aaron Bangor, Philip T. Kortum, and James T. Miller. 2008. An empirical evaluation of the system usability scale. International Journal of Human-Computer Interaction 24, 6 (July 2008), 574–594. https://doi.org/10.1080/10447310802205776 [5] R. M. Baños, C. Botella, M. Alcañiz, V. Liaño, B. Guerrero, and B. Rey. 2004. Immersion and emotion: Their impact on the sense of presence. CyberPsychology & Behavior 7, 6 (Dec. 2004), 734–741. https://doi.org/10.1089/cpb.2004.7.734 [6] Molly E. Boutin, Ty C. Voss, Steven A. Titus, Kennie Cruz-Gutierrez, Sam Michael, and Marc Ferrer. 2018. A high-throughput imaging and nuclear segmentation analysis protocol for cleared 3D culture models. Scientific Reports 8, 1 (Dec. 2018), 11135. https://doi.org/10. 1038/s41598-018-29169-0 [7] John Brooke et al. 1996. SUS-a quick and dirty usability scale. Usability Evaluation in Industry 189, 194 (1996), 4–7. [8] Alan B. Cantor. 1996. Sample-size calculations for Cohen’s kappa. Psychological Methods 1, 2 (1996), 150–153. https://doi.org/10.1037/ 1082-989X.1.2.150 [9] Lauren N. Carroll, Alan P. Au, Landon Todd Detwiler, Tsung-chieh Fu, Ian S. Painter, and Neil F. Abernethy. 2014. Visualization and analytics tools for infectious disease epidemiology: A systematic review. Journal of Biomedical Informatics 51 (Oct. 2014), 287–298. https://doi.org/10.1016/j.jbi.2014.04.006 [10] Shao-Ning Chang and Wei-Lun Chen. 2017. Does visualize industries matter? A technology foresight of global virtual reality and augmented reality industry. In 2017 International Conference on Applied System Innovation (ICASI). IEEE, Sapporo, Japan, 382–385. https://doi.org/10.1109/ICASI.2017.7988432 [11] Jenny J. Chen, Natala J. Menezes, Adam D. Bradley, and T. North. 2011. Opportunities for crowdsourcing research on Amazon Me- chanical Turk. Interfaces 5, 3 (2011), 1. [12] Christine M. Colvis and Christopher P. Austin. 2014. Innovation in therapeutics development at the NCATS. Neuropsychopharmacology 39, 1 (Jan. 2014), 230–232. https://doi.org/10.1038/npp.2013.247 [13] Maxime Cordeil, Tim Dwyer, Karsten Klein, Bireswar Laha, Kim Marriott, and Bruce H. Thomas. 2017. Immersive collaborative analysis of network connectivity: Cave-style or head-mounted display? IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 441–450. [14] Vickie Curtis. 2015. Motivation to participate in an online citizen science game: A study of foldit. Science Communication 37, 6 (Dec. 2015), 723–746. https://doi.org/10.1177/1075547015609322 [15] Barney Dalgarno and Mark J. W. Lee. 2010. What are the learning affordances of 3-D virtual environments?: Learning affordances of 3-D virtual environments. British Journal of Educational Technology 41, 1 (Jan. 2010), 10–32. https://doi.org/10.1111/j.1467-8535.2009.01038.x ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:26 • A. Elor et al. [16] Julia Diemer, Georg W. Alpers, Henrik M. Peperkorn, Youssef Shiban, and Andreas Mühlberger. 2015. The impact of perception and presence on emotional reactions: A review of research in virtual reality. Frontiers in Psychology 6 (2015). https://doi.org/10.3389/fpsyg. 2015.00026 [17] Ciro Donalek, S. G. Djorgovski, Alex Cioc, Anwell Wang, Jerry Zhang, Elizabeth Lawler, Stacy Yeh, Ashish Mahabal, Matthew Graham, Andrew Drake, Scott Davidoff, Jeffrey S. Norris, and Giuseppe Longo. 2014. Immersive and collaborative data visualization using virtual reality platforms. In 2014 IEEE International Conference on Big Data (Big Data). IEEE, Washington, DC, USA, 609–614. https:// doi.org/10.1109/BigData.2014.7004282 [18] Nadezhda T. Doncheva, Karsten Klein, Francisco S. Domingues, and Mario Albrecht. 2011. Analyzing and visualizing residue networks of protein structures. Trends in Biochemical Sciences 36, 4 (April 2011), 179–182. https://doi.org/10.1016/j.tibs.2011.01.002 [19] Paul Dourish. 2001. Where the Action is: The Foundations of Embodied Interaction. MIT Press, Cambridge, Mass. [20] Matt Dunleavy, Chris Dede, and Rebecca Mitchell. 2009. Affordances and limitations of immersive participatory augmented reality simulations for teaching and learning. Journal of Science Education and Technology 18, 1 (Feb. 2009), 7–22. https://doi.org/10.1007/ s10956-008-9119-1 [21] Aviv Elor and Sri Kurniawan. 2020. The ultimate display for physical rehabilitation: A bridging review on immersive virtual reality. Frontiers in Virtual Reality 1 (Nov. 2020), 585993. https://doi.org/10.3389/frvir.2020.585993 [22] Aviv Elor, Michael Powell, Evanjelin Mahmoodi, Nico Hawthorne, Mircea Teodorescu, and Sri Kurniawan. 2020. On shooting stars: Comparing cave and HMD immersive virtual reality exergaming for adults with mixed ability. ACM Transactions on Computing for Healthcare 1, 4 (2020), 1–22. [23] Antonio Foncubierta Rodríguez and Henning Müller. 2012. Ground truth generation in medical imaging: A crowdsourcing-based iterative approach. In Proceedings of the ACM Multimedia 2012 Workshop on Crowdsourcing for Multimedia - CrowdMM’12. ACM Press, Nara, Japan, 9. https://doi.org/10.1145/2390803.2390808 [24] Ivan Fraietta and Fabio Gasparri. 2016. The development of high-content screening (HCS) technology and its importance to drug discovery. Expert Opinion on Drug Discovery 11, 5 (May 2016), 501–514. https://doi.org/10.1517/17460441.2016.1165203 [25] J. L. Gabbard, D. Hix, and J. E. Swan. 1999. User-centered design and evaluation of virtual environments. IEEE Computer Graphics and Applications 19, 6 (Dec. 1999), 51–59. https://doi.org/10.1109/38.799740 [26] Xing Gong, Stephen J. Glick, Bob Liu, Aruna A. Vedula, and Samta Thacker. 2006. A computer simulation study comparing lesion detec- tion accuracy with digital mammography, breast tomosynthesis, and cone-beam CT breast imaging: Comparison of lesion detectability with 3 breast imaging modalities. Medical Physics 33, 4 (March 2006), 1041–1052. https://doi.org/10.1118/1.2174127 [27] Hayet Hadjar, Abdelkrim Meziane, Rachid Gherbi, Insaf Setitra, and Noureddine Aouaa. 2018. WebVR based interactive visualization of open health data. In Proceedings of the 2nd International Conference on Web Studies. ACM, Paris France, 56–63. https://doi.org/10. 1145/3240431.3240442 [28] Jeremy Hall, Stelvia Matos, Stefan Gold, and Liv S. Severino. 2018. The paradox of sustainable innovation: The ‘Eroom’ effect (Moore’s law backwards). Journal of Cleaner Production 172 (Jan. 2018), 3487–3497. https://doi.org/10.1016/j.jclepro.2017.07.162 [29] R. J. Hunt. 1986. Percent agreement, Pearson’s correlation, and kappa as measures of inter-examiner reliability. Journal of Dental Research 65, 2 (Feb. 1986), 128–130. https://doi.org/10.1177/00220345860650020701 [30] Hans-Jürgen Huppertz, Christina Grimm, Susanne Fauser, Jan Kassubek, Irina Mader, Albrecht Hochmuth, Joachim Spreer, and An- dreas Schulze-Bonhage. 2005. Enhanced visualization of blurred gray-white matter junctions in focal cortical dysplasia by voxel-based 3D MRI analysis. Epilepsy Research 67, 1-2 (Oct. 2005), 35–50. https://doi.org/10.1016/j.eplepsyres.2005.07.009 [31] Angus P. R. Johnston, James Rae, Nicholas Ariotti, Benjamin Bailey, Andrew Lilja, Robyn Webb, Charles Ferguson, Sheryl Maher, Thomas P. Davis, Richard I. Webb, John McGhee, and Robert G. Parton. 2018. Journey to the centre of the cell: Virtual reality immersion into scientific data. Tracffi 19, 2 (Feb. 2018), 105–110. https://doi.org/10.1111/tra.12538 [32] Neesha Jothi, Nur’Aini Abdul Rashid, and Wahidah Husain. 2015. Data mining in healthcare - a review. Procedia Computer Science 72 (2015), 306–313. https://doi.org/10.1016/j.procs.2015.12.145 [33] Kwanguk Kim, M. Zachary Rosenthal, David Zielinski, and Rachel Brady. 2012. Comparison of desktop, head mounted display, and six wall fully immersive systems using a stressful task. In 2012 IEEE Virtual Reality Workshops (VRW). IEEE, 143–144. [34] Magic Leap. 2019. Magic Leap One–CREATOR EDITION. Internet: https://www.magicleap.com/magic-leap-one [Jan. 19, 2019] (2019). [35] Howard Lee and Yi-Ping Phoebe Chen. 2015. Image based computer aided diagnosis system for cancer detection. Expert Systems with Applications 42, 12 (July 2015), 5356–5365. https://doi.org/10.1016/j.eswa.2015.02.005 [36] Stefan Letzsch, Karin Boettcher, Jens M. Kelm, and Simon Messner. 2015. Quantifying efflux activity in 3D liver spheroids: New confocal imaging instruments allow screening in complex human liver microtissues. Genetic Engineering & Biotechnology News 35, 7 (April 2015), 14–15. https://doi.org/10.1089/gen.35.07.08 [37] Linfeng Li, Qiong Zhou, Ty C. Voss, Kevin L. Quick, and Daniel V. LaBarbera. 2016. High-throughput imaging: Focusing in on drug discovery in 3D. Methods 96 (March 2016), 97–102. https://doi.org/10.1016/j.ymeth.2015.11.013 [38] Vaja Liluashvili, Selim Kalayci, Eugene Fluder, Manda Wilson, Aaron Gabow, and Zeynep H. Gümüş. 2017. iCAVE: An open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D. GigaScience 6, 8 (Aug. 2017). https://doi.org/10. 1093/gigascience/gix054 ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. BioLumin • 44:27 [39] Robb Lindgren and Mina Johnson-Glenberg. 2013. Emboldened by embodiment: Six precepts for research on embodied learning and mixed reality. Educational Researcher 42, 8 (Nov. 2013), 445–452. https://doi.org/10.3102/0013189X13511661 [40] Robb Lindgren, Michael Tscholl, Shuai Wang, and Emily Johnson. 2016. Enhancing learning and engagement through embodied interac- tion within a mixed reality simulation. Computers & Education 95 (April 2016), 174–187. https://doi.org/10.1016/j.compedu.2016.01.001 [41] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (Dec. 2017), 60–88. https://doi.org/10.1016/j.media.2017.07.005 [42] S. Liu. 2019. AR/VR headset shipments worldwide 2020-2025 | Forecast unit shipments of augmented (AR) and virtual reality (VR) headsets from 2019 to 2023 (in millions) | Statista Research. (2019). https://www.statista.com/statistics/653390/worldwide-virtual-and- augmented-reality-headset-shipments/. [43] Miguel Angel Luengo-Oroz, Asier Arranz, and John Frean. 2012. Crowdsourcing malaria parasite quantification: An online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research 14, 6 (Nov. 2012), e167. https://doi.org/10.2196/ jmir.2338 [44] Zhihan Lv, Alex Tek, Franck Da Silva, Charly Empereur-mot, Matthieu Chavent, and Marc Baaden. 2013. Game on, science - how video game technology may help biologists tackle visualization challenges. PLoS ONE 8, 3 (March 2013), e57990. https://doi.org/10. 1371/journal.pone.0057990 [45] Alice McCarthy. 2012. New NIH center to streamline translational science. Chemistry & Biology 19, 2 (Feb. 2012), 165–166. https:// doi.org/10.1016/j.chembiol.2012.02.005 [46] Holly Moore. 2015. MATLAB for Engineers (Fourth edition). Pearson, Boston. [47] Benedikt Morschheuser, Juho Hamari, and Jonna Koivisto. 2016. Gamification in crowdsourcing: A review. In 2016 49th Hawaii Inter- national Conference on System Sciences (HICSS). IEEE, Koloa, HI, USA, 4375–4384. https://doi.org/10.1109/HICSS.2016.543 [48] Laura Nelson, Dianne Cook, and Carolina Cruz-Neira. 1999. XGobi vs the C2: Results of an experiment comparing data visualization in a 3-D immersive virtual reality environment with a 2-D workstation display. Computational Statistics 14, 1 (March 1999), 39–51. https://doi.org/10.1007/PL00022704 [49] Ekaterina Olshannikova, Aleksandr Ometov, Yevgeni Koucheryavy, and Thomas Olsson. 2015. Visualizing big data with augmented and virtual reality: Challenges and research agenda. Journal of Big Data 2, 1 (Dec. 2015), 22. https://doi.org/10.1186/s40537-015-0031-2 [50] Tony Parisi. 2012. WebGL: Up and Running (1st edition). O’Reilly, Sebastopol, Calif. OCLC: ocn809911501. [51] Georgios A. Pavlopoulos, Anna-Lynn Wegener, and Reinhard Schneider. 2008. A survey of visualization tools for biological network analysis. BioData Mining 1, 1 (Dec. 2008), 12. https://doi.org/10.1186/1756-0381-1-12 [52] Aleksey Porollo and Jaroslaw Meller. 2007. Versatile annotation and publication quality visualization of protein complexes using POLYVIEW-3D. BMC Bioinformatics 8, 1 (Dec. 2007), 316. https://doi.org/10.1186/1471-2105-8-316 [53] Muhammad Imran Razzak, Saeeda Naz, and Ahmad Zaib. 2018. Deep learning for medical image processing: Overview, challenges and the future. In Classification in BioApps , Nilanjan Dey, Amira S. Ashour, and Surekha Borra (Eds.). Vol. 26. Springer International Publishing, Cham, 323–350. https://doi.org/10.1007/978-3-319-65981-7_12 [54] Senthilkumar Sa. 2018. Big data in healthcare management: A review of literature. American Journal of Theoretical and Applied Business 4, 2 (2018), 57. https://doi.org/10.11648/j.ajtab.20180402.14 [55] Jeff Sauro and James R. Lewis. 2016. Quantifying the User Experience: Practical Statistics for User Research (2nd edition). Morgan Kauf- mann, Cambridge. OCLC: 957463646. [56] Jack W. Scannell, Alex Blanckley, Helen Boldon, and Brian Warrington. 2012. Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews Drug Discovery 11, 3 (March 2012), 191–200. https://doi.org/10.1038/nrd3681 [57] R. R. Schaller. 1997. Moore’s law: Past, present and future. IEEE Spectrum 34, 6 (June 1997), 52–59. https://doi.org/10.1109/6.591665 [58] Sascha Seifert, Michael Kelm, Manuel Moeller, Saikat Mukherjee, Alexander Cavallaro, Martin Huber, and Dorin Comaniciu. 2010. Se- mantic annotation of medical images. In Medical Imaging 2010: Advanced PACS-based Imaging Informatics and Therapeutic Applications, Vol. 7628. International Society for Optics and Photonics, 762808. [59] Richard Skarbez, Frederick P. Brooks, Jr., and Mary C. Whitton. 2018. A survey of presence and related concepts. Comput. Surveys 50, 6 (Jan. 2018), 1–39. https://doi.org/10.1145/3134301 [60] Mel Slater. 1999. Measuring presence: A response to the Witmer and Singer presence questionnaire. Presence: Teleoperators and Virtual Environments 8, 5 (Oct. 1999), 560–565. https://doi.org/10.1162/105474699566477 [61] Unity Technologies. 2019. Unity real-time development platform | 3d, 2D VR & AR. Internet: https://unity.com/ [Jun. 06, 2019] (2019). [62] Martin Usoh, Ernest Catena, Sima Arman, and Mel Slater. 2000. Using presence questionnaires in reality. Presence: Teleoperators and Virtual Environments 9, 5 (Oct. 2000), 497–503. https://doi.org/10.1162/105474600566989 [63] Brady T. West, Kathleen B. Welch, Andrzej T. Gałecki, and Brenda W. Gillespie. 2015. Linear Mixed Models: A Practical Guide Using Statistical Software (Second edition). CRC Press, Taylor & Francis Group, Boca Raton. [64] Nilmini Wickramasinghe, Jatinder N. D. Gupta, and Sushil Sharma (Eds.). 2005. Creating Knowledge-Based Healthcare Organizations:. IGI Global. https://doi.org/10.4018/978-1-59140-459-0 ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022. 44:28 • A. Elor et al. [65] Zheng Yang, Keren Lasker, Dina Schneidman-Duhovny, Ben Webb, Conrad C. Huang, Eric F. Pettersen, Thomas D. Goddard, Elaine C. Meng, Andrej Sali, and Thomas E. Ferrin. 2012. UCSF chimera, MODELLER, and IMP: An integrated modeling system. Journal of Structural Biology 179, 3 (Sept. 2012), 269–278. https://doi.org/10.1016/j.jsb.2011.09.006 [66] Illhoi Yoo, Patricia Alafaireet, Miroslav Marinov, Keila Pena-Hernandez, Rajitha Gopidi, Jia-Fu Chang, and Lei Hua. 2012. Data mining in healthcare and biomedicine: A survey of the literature. Journal of Medical Systems 36, 4 (Aug. 2012), 2431–2448. https://doi.org/10. 1007/s10916-011-9710-5 [67] Bei Yu, Matt Willis, Peiyuan Sun, and Jun Wang. 2013. Crowdsourcing participatory evaluation of medical pictograms using Amazon Mechanical Turk. Journal of Medical Internet Research 15, 6 (June 2013), e108. https://doi.org/10.2196/jmir.2513 [68] Jimmy F. Zhang, Alex R. Paciorkowski, Paul A. Craig, and Feng Cui. 2019. BioVR: A platform for virtual reality assisted biological data integration and visualization. BMC Bioinformatics 20, 1 (Dec. 2019), 78. https://doi.org/10.1186/s12859-019-2666-z Received 23 February 2021; revised 7 July 2022; accepted 8 July 2022 ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 44. Publication date: October 2022.
ACM Transactions on Computing for Healthcare (HEALTH) – Association for Computing Machinery
Published: Nov 3, 2022
Keywords: Immersive technologies
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.