At BrightPlanet, we work in the Deep Web to help turn your unstructured web content into structured harvested data. Our data harvest technologies allow you to discover greater insight into the important data you need, allowing you to make well-informed business decisions.
What is the Deep Web?
The Deep Web refers to data stored in a website's internal database. This data is typically not accessible via traditional search engines such as Google, Bing, and Yahoo, which only index the Surface Web.
Since traditional search engines work by crawling links throughout the Web in order to find results, Deep Web data is categorized as content on the Web that is not available by clicking on a link.
Deep Web content can be found almost anytime you navigate away from Google and do a search directly in a website. Government databases and libraries are examples of websites that contain huge amounts of Deep Web data, since Google search can’t find the pages behind these website search boxes.
Consequently, Deep Web data is only accessible through Deep Web search by issuing queries to a website's search form. Once a query is issued, BrightPlanet extracts the search results and categorizes all relevant documents.
How Does Deep Web Search Work?
The Deep Web is at least 400-500 times the size of the Surface Web. It is continuously growing, and that means the number of Deep Web sources that need to be tapped into through Deep Web search are also growing.
In order to get access to Deep Web sources, BrightPlanet’s Data Acquisition Engineer team configures each source. This means testing the search forms and filtering out irrelevant links. We also determine which text fields should be extracted in order to guarantee clean results.
Configured sources are grouped and categorized into what BrightPlanet calls the Source Repository, a library of Deep Web sources/websites that BrightPlanet has collected over 10 years of executing web harvests on behalf of clients. BrightPlanet's exclusive Source Repository technology categorizes Deep Web data into groups such as, Law, Healthcare, Pharmaceuticals, Social Media, Major Media, Newspapers, Finance & Economics, Politics, and over 50 other groups.
The Source Repository configures websites into Deep Web sources, allowing BrightPlanet’s DeepHarvester to automate queries directly into the search forms of each of the sites. Applying a query directly into the source allows the Harvester to go beyond the surface site to pull the content that can only be accessed via a query to the site’s search form.
Elevate Your Deep Web Data with BrightPlanet
Not accessing all of your Deep Web resources is like only putting together half of the puzzle. At BrightPlanet, we help customers complete the puzzle to find the data they want on the Deep Web through our process of harvesting, curating, and developing insights.
Interested in learning more about the Deep Web and Deep Web search? Download our white paper on Understanding the Deep Web in 10 Minutes to learn why you should care about the Deep Web, discover how data is harvested from the Deep Web, and to read Deep Web harvest use cases.
Let BrightPlanet help you make sense of your web data through our Data-as-a-Service technology. Schedule a consultation with a Data Acquisition Engineer and begin elevating your data harvest process today.