Technologies at CHS
The diversity of research interests at CHS allows Information Technology (IT) staff to work and experiment with a variety of technologies. These include:
- Microsimulation Models—C#
- Internet Technologies—Windows 2003, IIS, Macromedia Dreamweaver, Microsoft FrontPage, C#, SQL, Visual Studio.NET, ASP.NET, VB.NET Smart Client, .NET XML Web Services, Java, and various open-source toolkits for natural language processing and distributed computing
- Distributed Computing Solutions—Cancer simulation, Linux cluster
- Data Warehousing—SAS, SQL Server, and Sybase Databases
- Automated Chart Review Systems and Tracking Databases—Microsoft Access, SQL Server, Visual Basic, C#, and .NET
- Natural Language Processing—Analysis of textual data
Microsimulation models
The Cancer Intervention and Surveillance Modeling Network's (CISNET) micro-simulation model of the natural history of colorectal cancer is implemented using the C# programming language. The model's programming implementation is based on the well-documented state design pattern. This design pattern decentralizes computing among separate micro-simulation model processes. Each model process is responsible for transitioning the programming model to the next proper model process. For example, the micro-simulation process that models preclinical invasive cancers is solely responsible for transitioning an individual's data to either a condition indicating the person has died from a cause not related to colorectal cancer or to a condition indicating the person has preclinical invasive cancer that was subsequently detected clinically. The micro-simulation model program executes in two separate modes. An estimation mode outputs the data required to infer the Bayesian posterior distribution of the model's input parameters. A "run" mode keeps the simulated outcomes of the natural life histories of individuals persistent, given that the model parameters are specified from their posterior distribution.
Internet technologies and Web programming
CHS has an intranet and an external Web site. Both are supported by Windows Server-based systems using Microsoft Internet Information Server 5 and 6. For the intranet, our primary Web content tools are Adobe Dreamweaver and Contribute, with Web applications developed in ASP and ASP.NET. For the external Web site, we are transitioning from FrontPage to Dreamweaver. Our external resources also include listserv mailing lists and project-specific Web sites.
To further our research, CHS makes use of Internet technology and Web programming in projects such as:
- Case management systems
Web applications that facilitate depression-care management are used on multiple studies. Case managers gather information on both the patient's current clinical status and their use of antidepressant medications. Assessment-based reports alert case managers and their supervisors to possible critical patient conditions. A summary report is printed and mailed to the patient's provider. Selected Web applications facilitate recruitment into the case-management program. Each Web application is written in the C# programming language. Object oriented constructs, such as Web page inheritance and component construction, are used to dynamically generate Web pages. Patient assessments are dynamically generated using Web components and are data-stored in an SQL Server database.
- Data exploration tools
Designed with faculty and managers in mind, the CHS intranet hosts a number of different data exploration tools including:
- Data counters
Created with ASP.NET and SQL Server, these tools allow staff to get utilization summary-data counts for writing grants, determine feasibility of studies, and check which codes are used in practice— without asking a programmer for help. CHS staff may point and click their way to counts of Group Health diagnosis, pharmacy, procedure, and SEER cancer data.
- Data plotter
The data plotter allows staff to easily access frequency-trend plots of diagnosis or procedure codes in graphical format. Monthly frequencies are generated in SAS and stored in SQL Server. An ASP.NET front end allows users to search for and select codes and clinics of interest. The resulting plots are displayed in an ActiveX control produced and distributed by SAS.
- Web SAS data
Web SAS Data enables staff to view the contents of a dataset, produce one-way frequencies of variables, generate two-way crosstabs, calculate univariate statistics of numeric variables, and join two tables on a common variable and make inter-table crosstabs. Staff perform these analyses on their project SAS data without ever having to write a single line of code. SAS runs in the background and users access the data via an interface created using ASP.NET/VB.NET.
Distributed computing solutions
CHS researchers can take advantage of the power of multiple computers located within the Center.
- CISNET
The Cancer Intervention and Surveillance Modeling Network (CISNET) project has pioneered distributed processing at CHS. In a recent calibration run, the CISNET model needed to run 75,000 iterations of 2 million life histories, accounting for a total of 150 billion life histories. In simple terms, the CISNET model had to simulate life histories equivalent to about 20 times the current population of Earth. Currently, the CISNET-distributed simulation model runs 22 model instances on 14 separate Center computers.
- Linux computing cluster
Center computing capabilities include a small computer cluster. This Linux-based resource provides the R programming/analysis environment to Center biostatisticians and programmers. Some statistical approaches can be subdivided and processed in parallel, using the multiple computing nodes within the cluster.
Data warehousing
We have two data warehouses: the CHS Data Warehouse and the Breast Cancer Consortium Statistical Coordinating Center (SCC) Data Warehouse.
The CHS Data Warehouse is a repository of SAS data sets. Development of this warehouse began in 1996, and it was in use by 1997. It is comprised of subsets of the most commonly used clinical and administrative Group Health data sets. Enrollment information is available from 1980, and clinical data date back to the inception of various Group Health automated systems, many originating in the 1980s. Before data are copied to the CHS Data Warehouse, they are reformatted to standardize variables. The data sets are password-protected, and consumer identifiers are encrypted. At present, the CHS Data Warehouse stores approximately a billion records, taking up over 600GB of space. This system provides efficient access to data and ensures that historical data are retained.
CHS is also home to the Breast Cancer Surveillance Consortium SCC data warehouse. Using SQL Server, this data warehouse stores breast cancer information from seven sites across the United States. Each site sends eight different text files that cover a whole sequence of events from mammography to cancer, if diagnosed. The text files are run through an "error check" stored procedure written in SQL to check for invalid values. A "longitudinal" SAS data set is created that links information across some of the files. This data set is used in most analytical work. To date, the consortium has collected data for more than 2.1 million women and more than 8 million mammography exams that are associated with more than 91,000 breast cancers, both invasive and DCIS.
Automated chart review systems and tracking databases
Many of the research projects at CHS require chart reviews and tracking databases to capture patient data. For most studies, we design these databases in Microsoft Access or SQL.
- Chest X-ray abstraction system using SQL and .NET
A project undertaken by the Center's immunization and infectious diseases group seeks to test for a change in the incidence of pneumonia coincident with the introduction of a new children's vaccine. The task involves reviewing roughly 65,000 radiology reports for chest x-rays taken over a period of seven years in order to confirm positive cases of pneumonia. A SQL Server database holds the radiology reports and the data collected in the abstraction process. A system of Windows forms, developed with the C# language in the .NET environment, serves as the interface for the abstractors. As many as six abstractors use the system simultaneously. A parallel system allows project staff to review completed abstractions, revise collected data as needed, and complete abstractions of unclear records. Records are presented to the abstractors in a predetermined random order so that episodes (clusters of radiology reports) do not appear in chronological order, yet subsequent reports within a given episode may be skipped once the earliest positive report is found. Tasks such as random selection and assignment of records for re-review, as well as weekly and monthly progress and consistency reports, are automated using stored procedures and scheduled jobs.
- CHCRTracker
CHCRTracker is a multithreaded client-server application for tracking recruitment of Group Health members for a study of smoking cessation. The application is capable of generating recruitment letters and tracking recruits' contact information, staff/recruit contacts, and recruit statuses.
The user interface was built in VB.NET, using the WinForms engine. The data reside on a central SQL Server, and are manipulated through stored procedures, table triggers, scripts, and raw SQL. Reports are generated through Microsoft Access.
- Automated chart abstraction using VB and Access
One innovative use of an Access database and Visual Basic programming was designed for a study of the relationship between post-menopausal hormone replacement therapy and heart disease in women with diabetes. The study required more than 3,000 medical chart reviews—at one hour plus per review. An Access database was designed with forms similar to the paper forms used for chart data collection. Doing direct entry of the chart information into the database saved 5–15 minutes per record and eliminated the need for re-pulls due to missing values.
Natural language processing
CHS information technology staff have deployed a pilot version of the Cancer Text Information Extraction System,which is also known by its acronym caTIES (pronounced kah-TIES). CaTIES uses software to identify medical terms, diagnoses, procedures, and the names of organ systems referred to in free-text pathology reports, radiology reports, or chart notes. In the past such coding has been done well only by human chart reviewers. In the future, this new technology will be used to conduct data mining tasks faster and at less expense, creating opportunities for research previously considered infeasible.


