Dotnet Core Web Scraping



Dotnet core 2.1

Introduction Web scraping is a popular term for various significant methods used to extract web metadata or gather valuable information across the Internet. Generally, this is accomplished with exclusive software that simulates web surfing to gather specific bits.

Apr 24, 2019 For Prometheus metrics in ASP NET Core, we will be using prometheus-net. Let us start by installing it from NuGet. Dotnet add package prometheus-net.AspNetCore. I am writing a program in dot net that will execute scripts and command line programs using the framework 2.0's Process object. I want to be able to access the screen buffers of the process in my program. I've investigated this and it appears that I need to access console stdout and stderr buffers. The Combo Touch for the new 11-inch iPad Pro is now available for $200.

There are many reasons you may need a website scraper. One of the biggest reasons I use website scrapers is to prevent me from visiting a site to look for something on a regular basis and losing the time spent on that site. For instance, when COVID-19 first hit, I visited the stats page on the Pennsylvania Department of Health each day. Another instance may be to watch for a sale item during Amazon’s Prime Day.

Getting Started

To get started, we’ll want to create an Azure Function. We can do that a few different ways:

  • Use the Azure extension for Visual Studio
  • Use the Azure Portal

At this point, use the method that you feel most comfortable with. I tend to use the command line or the Azure extension for Visual Studio Code as they tend to leave the codebase very clean. I’m making this function with C# so I can use some 3rd party libraries.

In my case, I’ve called my HttpTrigger function ScrapeSite.

Modifying the Function

Once the function is created, it should look like this:

We’ll bring in the NuGet package for HtmlAgilityPack so we can grab the appropriate area of our page. To do this, we’ll use a command line, navigate to our project and run:

In my case, I’m going to connect to Walmart and look at several Xbox products. I’ll be querying the buttons on the page to look at the InnerHtml of the button and ensure that it does not read “Get in-stock alert”. If it does, that means that the product is out of stock.

Our first step is to connect to the URL and read the page content. I’ll do this by creating a sealed class that can be used to help deliver the properties back to the function:

In this case, I’ll be returning a boolean value as well as the URL that I’m attempting to scrape from. This will allow me to redirect the user to that location when necessary.

While it is not illegal to screen scrape websites, you should make sure that you have the appropriate permission before scraping the site. In addition, if you scrape too often, the site may deem you as a bot and may block your IP address.

Next, I’m going to add a static class called Scraper. This will actually handle the majority of the scraping process. The class will take advantage of the HtmlWeb.LoadFromWebAsync() method in the HtmlAgilityPack package. The reason for this is that the built-in HttpClient() lacks the necessary headers to properly call most sites. If we use this library instead, most websites will record us as a bot.

After we connect to the URL, we’ll use a selector to grab all buttons and then use a LINQ query to count how many buttons contain the text “Get in-stock alert”. We’ll update the ProductAvailability object and return it back.

Finally, we’ll update our function to call the GetProductAvailability method multiple times:

Dotnet core web scraping tools

Results

Now, we can run our function from within Visual Studio Code. To do this, hit the F5 key. This will require that you have the Azure Functions Core Tools installed. If you do not, you’ll be prompted to install it. After it’s installed and you press F5, you’ll be prompted to visit your local URL for your function. If successful, you should see the following results (as of this post) for the above two products:

Conclusion

In this post we created a new Azure Function, built the function using VS Code, and connected to Walmart.com to obtain product information. If you’re interested in reviewing the finished product, be sure to check out the repository below:

Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)

For projects that support PackageReference, copy this XML node into the project file to reference the package.
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

Alpha seems to be stable so I release.

Dependencies

  • .NETStandard 2.0

    • FSharp.Core(>= 4.5.2)
    • HtmlAgilityPack(>= 1.7.4)
    • System.Runtime.Caching(>= 4.5.0-preview1-26216-02)

Used By

NuGet packages (16)

Showing the top 5 NuGet packages that depend on ScrapySharp:

PackageDownloads
RealmeyeSharp
Gets user infomation on realmeye.how to use got to project url and look at example
TripleA
TripleA is an extensible framework for building components for use in test frameworks to target system and deployment verification.
InstantQuick.SharePoint.Provisioning
CSOM based provisioning library for SharePoint 2013, 2016, and SharePoint Online
Smallcode.Net
Fluent HttpWebClient, Http parser and Json parser.
WeiXinApi

GitHub repositories (1)

Showing the top 1 popular GitHub repositories that depend on ScrapySharp:

RepositoryStars
ferventdesert/Hawk
visualized crawler & ETL IDE written with C#/WPF
Core

Dotnet Core Web Scraping Free

Version History

Dotnet Core Web Scraping Tool

VersionDownloadsLast updated
3.0.0 87,367 10/2/2018
3.0.0-alpha2 1,636 4/5/2018
3.0.0-alpha1 547 4/5/2018
2.6.2 101,481 7/4/2016
2.6.1 4,583 4/28/2016
2.6.0 781 4/28/2016
2.5.1 665 4/28/2016
2.5.0 7,479 2/22/2016
2.4.0 736 2/19/2016
2.4.0-beta1 563 2/5/2016
2.3.0 1,047 1/29/2016
2.2.63 17,496 11/21/2013
2.2.62 770 11/21/2013
2.2.61 1,685 10/4/2013
2.2.60 761 10/4/2013
2.2.59 800 10/2/2013
2.2.57 1,176 9/26/2013
2.2.56 1,110 9/12/2013
2.2.0 945 9/9/2013
2.1.55 1,646 7/24/2013
2.1.54 776 7/24/2013
2.1.53 1,538 6/20/2013
2.0.57-beta 1,279 5/17/2013
2.0.56-beta 718 5/6/2013
2.0.55-beta 753 4/26/2013
2.0.54-beta 739 4/24/2013
2.0.53-beta 1,082 4/5/2013
2.0.52 921 6/20/2013
2.0.52-beta 789 3/25/2013
2.0.51-beta 745 3/22/2013
2.0.50-beta 722 3/21/2013
2.0.49-beta 713 3/21/2013
2.0.48-beta 706 3/20/2013
2.0.47 805 6/20/2013
2.0.47-beta 692 3/20/2013
2.0.46-beta 784 3/20/2013
2.0.45-beta 707 3/20/2013
2.0.44-beta 716 3/19/2013
2.0.43-beta 747 3/14/2013
2.0.42-beta 866 2/13/2013
2.0.41-beta 786 2/6/2013
2.0.40-beta 764 2/6/2013
2.0.39-beta 739 2/6/2013
2.0.38-beta 709 2/4/2013
2.0.37-beta 748 2/4/2013
2.0.36-beta 735 1/22/2013
2.0.35-beta 730 1/15/2013
2.0.34-beta 705 1/4/2013
2.0.33-beta 720 1/4/2013
2.0.32-beta 781 1/4/2013
2.0.31-beta 696 1/4/2013
2.0.30-beta 765 1/4/2013
2.0.29-beta 711 1/4/2013
2.0.28-beta 749 1/4/2013
2.0.27-beta 720 1/2/2013
1.5.0 3,308 12/25/2012
1.4.3.1 1,034 12/11/2012
1.4.3 2,044 8/24/2012
1.4.2 839 8/17/2012
1.4.1 754 8/16/2012
1.4.0 788 8/16/2012
1.3.2 2,098 4/10/2012
1.3.0 950 4/3/2012
1.2.2 1,010 3/5/2012
1.2.1 923 2/20/2012
1.2.0 903 2/16/2012
1.1.0 1,148 12/7/2011
1.0.0 1,295 9/29/2011

Dotnet Core Web Scraping Software

Show more