IronWebScraper – The C# WebScraping Library 188.8.131.52 Retail
Iron WebScraper is a C# web scraping library, allowing developers to simulate & automate human browsing behavior to extract content, files & images from web applications as native .Net objects. Iron Web Scraper manages politeness & multithreading in the background, leaving a developer’s own application easy to understand & maintain.
Iron Web Scraper can be used to migrate content from existing websites as well as build search indexes and monitor website structure & content changes. It’s functionality includes:
» Fast multi threading allows hundreds of simultaneous requests.
» Politely avoid over stalling remote servers using IP/domain level throttling & optionally respecting robots.txt
» Manage multiple identities, DNS, proxies, user agents, request methods, custom headers, cookies & logins.
» Data exported from websites becomes native C# objects which can be stored or used immediately.
» Exceptions managed in all but the developers own code. Errors and captchas auto retried on failure
» Save, pause, resume, autosave scrape jobs.
» Built in web cache allows for action replay, crash recovery, and querying existing web scrape data. Change scrape logic on the fly, then replay job without internet traffic.
Iron WebScraper provides a powerful framework to extract data and files from websites using C# code.
Install IronWebScraper to your Project using Nuget
Create a Class Extending WebScraper
Create an Init method that uses the Request method to parse at least one URL.
Create a Parse method to process the requests, and indeed Request more pages. Use response.Css to work with HTML elements using jQuery style CSS selectors
In your application please create and instance of your web scraping class and call the Start(); method
Read our C# webscraping tutorials to learn who to create advanced web crawlers using IronWebScraper
Whether its product, integration or licensing queries, the Iron product development team are on hand to support all of your questions. Get in touch and start a dialog with Iron to make the most of our library in your project.
IronWebScraper must be programmed to know how to handle each “type” of page it encounters. This is achieved in a very concise manner using CSS Selectors or XPath expressions and can be fully customized in C#. This freedom allows you to decide which pages to scrape within a website, and what to do with the data extracted. Each method can be debugged and watched neatly in Visual Studio.
IronWebScraper deals with multithreading and web-requests to allow for hundreds of concurrent threads without the developer needing to manage them. Politeness can be set to throttle requests, so reducing risk of excessive load on target web servers.
IronWebScraper can use one or multiple “identities” – sessions that simulate real world human requests. Each request may programmatically or randomly assign its own Identity, User Agent, Cookies, Logins and even IP addresses. Requests are set as auto-unique with a combination of URL, parse method and post variables.
IronWebScraper uses advanced caching to allow developers to change their code “on the fly” and replay every previous request without contacting the internet. Every scrape job is autosaved and can be resumed in the event of an exception or power outage.
IronWebScraper puts Web Scraping tools in your own hands quickly with a Visual Studio installer. Whether installing directly from Nuget within visual studio or downloading the DLL, you’ll be setup in no time. Just one DLL and no dependancies.