Technologies at CHS

The diversity of research interests at CHS allows Information Technology (IT) staff to work and experiment with a variety of technologies. These include:

Microsimulation models

The Cancer Intervention and Surveillance Modeling Network's (CISNET) micro-simulation model of the natural history of colorectal cancer is implemented using the C# programming language. The model's programming implementation is based on the well-documented state design pattern. This design pattern decentralizes computing among separate micro-simulation model processes. Each model process is responsible for transitioning the programming model to the next proper model process. For example, the micro-simulation process that models preclinical invasive cancers is solely responsible for transitioning an individual's data to either a condition indicating the person has died from a cause not related to colorectal cancer or to a condition indicating the person has preclinical invasive cancer that was subsequently detected clinically. The micro-simulation model program executes in two separate modes. An estimation mode outputs the data required to infer the Bayesian posterior distribution of the model's input parameters. A "run" mode keeps the simulated outcomes of the natural life histories of individuals persistent, given that the model parameters are specified from their posterior distribution.

Top

Internet technologies and Web programming

CHS has an intranet and an external Web site. Both are supported by Windows Server-based systems using Microsoft Internet Information Server 5 and 6. For the intranet, our primary Web content tools are Adobe Dreamweaver and Contribute, with Web applications developed in ASP and ASP.NET. For the external Web site, we are transitioning from FrontPage to Dreamweaver. Our external resources also include listserv mailing lists and project-specific Web sites.

To further our research, CHS makes use of Internet technology and Web programming in projects such as:

Top

Distributed computing solutions

CHS researchers can take advantage of the power of multiple computers located within the Center.

Top

Data warehousing

We have two data warehouses: the CHS Data Warehouse and the Breast Cancer Consortium Statistical Coordinating Center (SCC) Data Warehouse.

The CHS Data Warehouse is a repository of SAS data sets. Development of this warehouse began in 1996, and it was in use by 1997. It is comprised of subsets of the most commonly used clinical and administrative Group Health data sets. Enrollment information is available from 1980, and clinical data date back to the inception of various Group Health automated systems, many originating in the 1980s. Before data are copied to the CHS Data Warehouse, they are reformatted to standardize variables. The data sets are password-protected, and consumer identifiers are encrypted. At present, the CHS Data Warehouse stores approximately a billion records, taking up over 600GB of space. This system provides efficient access to data and ensures that historical data are retained.

CHS is also home to the Breast Cancer Surveillance Consortium SCC data warehouse. Using SQL Server, this data warehouse stores breast cancer information from seven sites across the United States. Each site sends eight different text files that cover a whole sequence of events from mammography to cancer, if diagnosed. The text files are run through an "error check" stored procedure written in SQL to check for invalid values. A "longitudinal" SAS data set is created that links information across some of the files. This data set is used in most analytical work. To date, the consortium has collected data for more than 2.1 million women and more than 8 million mammography exams that are associated with more than 91,000 breast cancers, both invasive and DCIS.

Top

Automated chart review systems and tracking databases

Many of the research projects at CHS require chart reviews and tracking databases to capture patient data. For most studies, we design these databases in Microsoft Access or SQL.

Top

Natural language processing

CHS information technology staff have deployed a pilot version of the Cancer Text Information Extraction System,which is also known by its acronym caTIES (pronounced kah-TIES). CaTIES uses software to identify medical terms, diagnoses, procedures, and the names of organ systems referred to in free-text pathology reports, radiology reports, or chart notes. In the past such coding has been done well only by human chart reviewers. In the future, this new technology will be used to conduct data mining tasks faster and at less expense, creating opportunities for research previously considered infeasible. 

Top