Press Releases by CGIDir

Reporting Requirements of U.S. Congress Leads The National Institutes of Health to Implement PDFTextStream

September 14, 2006; 06:36 AM
Snowtide Informatics Systems, Inc., the leading provider of enterprise-class PDF text and metadata extraction APIs (Application Programming Interface) for Java applications and Web services, announced that the National Institutes of Health (NIH) is successfully using its flagship product, PDFTextStream, to automate the text extraction process for thousands of archived research grant applications in PDF (Portable Document Format). The process is enabling the NIH to feed its content analysis systems with thousands of archived grant applications, and in turn generate more accurate funding reports required by Congress.

With an annual budget of over $28 billion, and almost 200,000 grant applications reviewed every year, the NIH has been embarking on various projects to convert archived paper documents to electronic files in order to enable efficient and accurate report generation. The deployment of PDFTextStream is part of this enormous undertaking. According to the NIH, the Java API has already helped the agency save thousands of man-hours by automating the tedious and complex task of PDF text extraction.

“We needed to be able to categorize NIH grant applications based on their specific content so we could ultimately generate accurate funding reports for Congress,” said LiMing Shen, Lead Integration Engineer at the NIH, and the project lead for the text extraction project.

“This was more challenging than it sounds. Our conversion process to PDF was producing documents that were readable to the human eye, but difficult for our text extraction engines to process accurately. Not only did PDFTextStream do an outstanding job extracting text, but it also accurately identified specific sections within each PDF documents—a demanding function other PDF extraction tools we evaluated could not provide. This feature alone has already saved us hundreds of hours of costly manual work, and will add up to thousands of hours saved by the time the project is complete.”

“Our focus has always been to provide customers with the highest performing PDF text and metadata extraction API for Java, .NET, and Python,” said Chas Emerick, President, Snowtide Informatics Systems, Inc. “The NIH development team put PDFTextStream through some very rigorous tests. We’re pleased to see that our API performed very well and is having a significant impact on productivity and cost savings for the agency.”

About the National Institutes of Health
The National Institutes of Health (NIH), a part of the U.S. Department of Health and Human Services, is the primary Federal agency tasked with conducting and supporting medical research. Helping to lead the way toward important medical discoveries that improve people's health and save lives, NIH scientists investigate ways to prevent disease as well as the causes, treatments, and even cures for common and rare diseases. With an annual budget of over $28 billion, the NIH awards almost 50,000 competitive grants every year to more than 212,000 researchers at over 2,800 universities, medical schools, and other research institutions in every state and around the world.

About Snowtide Informatics Systems, Inc.
Snowtide Informatics Systems, Inc. is a privately held software company headquartered in Holyoke, Massachusetts. Its high-performance software and custom development services enable large enterprises and government agencies to automate the extraction, conversion, and cataloging of content held in PDF documents. PDFTextStream, Snowtide Informatics' flagship product, is a software component library for Java, Python, and .NET environments that has been built from the ground up to rapidly and accurately extract text and metadata held in PDF documents. For more information about Snowtide Informatics Systems visit snowtide.com.