The HTML Agility Pack

For a current project I needed to perform a simple screen scrape action. The resulting solution was functional but a bit rough and ready. Luckily I stumbled upon this open-source HTML library project: The HTML Agility Pack, hosted on CodePlex at http://htmlagilitypack.codeplex.com.

It is an excellent little library that makes dealing with HTML a breeze, whether you are screen scraping or just manipulating HTML documents locally. It is very forgivable with regards to malformed HTML documents and supports loading pages directly from the web. You can just parse the HTML or modify it, and it even supports LINQ.  A key benefit of this library is that it doesn’t force you to learn a new object model but instead mirrors the System.XML object model – a huge help for getting up and running quickly, as well as making coding it feel natural.

Download HTML directly via a URL:

HtmlDocument htmlDoc = new HtmlDocument();
HtmlWeb webGet = new HtmlWeb();
htmlDoc = webGet.Load(url);

Or parse an HTML string:

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlString);

Then you can use XPATH to query the HTML document as you would an XML document:           

// select a <li> where it has an element of <b> with a value of "Name:"
var nameItem = htmlDoc.DocumentNode.SelectSingleNode("//li[b='Name:']");
if (nameItem != null && nameItem.ChildNodes.Count > 1)
{
    name = nameItem.ChildNodes[1].InnerText;
}

You can download it via NuGet here : http://nuget.org/packages/HtmlAgilityPack.

For more examples of it’s use check out these posts: Parsing HTML Documents with the Html Agility Pack and Crawling a web sites with HtmlAgilityPack.

Enjoy.

Advertisement

Full Trust For Applications Running On Remote Share

A large number of .Net applications in enterprise environments are run directly from a file share on a server within the local corporate intranet. This was usually only achieved after editing the client machine’s registry. However as of .Net 3.5 Sp1 this is no longer an issue as assemblies accessed from a local intranet share are granted full trust. There are some restrictions, for example it only applies to assemblies loaded from the same directory as the target executable. Apparently this restriction has been removed in .Net 4 though. Although this has been around since the beta of 3.5 Sp1 I wasn’t aware of it and thought it was worth sharing. Read more about it here.

Live Mesh : Specify Folder Locations

Windows Live Mesh is a nifty tool for sharing data across various devices. One problem with it though is that by default new Mesh Folders are added to the Desktop of all the devices in your mesh. There is a way to prevent this though.

Creating a new Mesh folder and setting it to synchronise with all your other devices will mean that it appears on the desktop of those machines. The way to avoid this is by ensuring that when you set up a new folder (from either the Live Desktop or directly on any of the devices) make sure that the ‘Synchronise Files’ setting is set to ‘Never with this device’ for all devices (as below):

ChgSynch2

From each individual device you will now be able to see the new folder in the Live Mesh ‘Manage Folders’ view. From there select the new folder, right click and select ‘Change Synch Settings’  which shows the Change Synchronization dialog (as above) but the Name and Location fields are now also enabled allowing you to change the location of the file on the current device. Specify a folder location and then change the ‘Synchronise Files’ setting to ‘When files are added or modified’ for the local machine (as below). The files will now Synchronise as normal but to the chosen target folder and not your desktop.

ChgSynch3

Repeat this process for the each device in your Mesh.

Windows Azure: Links

A selection of the best links for Microsoft Azure information:

Windows Azure: Storage Service Initialization Problems

The first time the Development Storage service is started up on a machine, an initialization takes place. This was first started on my box when I attempted to debug my first Cloud project in Visual Studio. A successful Initialization should look like this:

Dev Storage Init

However this initialization failed with a generic error suggesting a timeout whilst connecting to the SQL Express database.

Investigation proved it was a database permissions issue and I needed to set Admin Rights to my user (via the SQL Server Express – Surface Area Configuration app) as admin users are not automatically given this permission in SQL Express). Once set the Storage Service could access the database but the installation was corrupt, resulting in a invalid object db.Accounts message on service start-up.

To resolve this I deleted the DevelopmentStorageDb database and then re-ran the Initialization program, which is found at:

~Program Files\Windows Azure SDK\v1.0\bin\DSInit.exe

This connected to SQL Express and created a fresh database fine.

Missing Windows

If you usually use two monitors in a dual screen setup you may find that when you move back to one (such as when you undock your laptop to go mobile) occasionally some windows disappear. On launching an application it may not appear despite evidence of it running fine (icon in the toolbar for example). This is because the application’s window has opened up in the area of desktop previously visible on your second monitor.

To resolve the problem:

1) Give the application focus by selecting it in the Windows Taskbar or selecting “Switch To” from Task Manager.
2) Press “ALT” and “SPACE” together, which pops up the window context menu.
3) Select  “Move”, and then use the arrow keys to move the window into view.

This happened to me again this week and I always forget the keyboard shortcut to launch the current window context menu. Here is a useful link I found that lists useful keyboard shortcuts.

SCSF Recipes Not Working – Check Solution File

I recently opened a SCSF project that was having build problems and I found that once the build problems had been corrected the GAT recipes were not working. An error was being reported when I tried to run the Add Module recipe:

Microsoft.Practices.RecipeFramework.RecipeExecutionException: An exception occurred during the binding of reference or execution of recipe CreateBusinessModuleCS. Error was: The following arguments are required and don’t have values: CommonProject. Can’t continue execution..

Evetually I found that the required attrbutes were not set in the Solution file (.sln) so the solution was unaware of the SCSF responsiblities it had. I checked this link…

http://www.ideablade.com/forum/forum_posts.asp?TID=357

It details the solution data required (example below), which I confirmed with other SCSF solutions I had.

GlobalSection(ExtensibilityGlobals) = postSolution
RootNamespace = MySolutionName.MyProjectName
CommonProjectGuid = 7432c860-3226-49fa-a9f4-2dd27d1229b8
ShellProjectGuid = 6ee16e85-57a7-4a00-9018-43eca17194cb
EndGlobalSection

I updated the GUIDs to be the ones from my solution (ie the GUID of the Infrastructure.Interface and the Shell projects) and that sorted the problem.

This could be due to a new solution file being checked in by the developer who created the original SCSF project, and not the original Solution file (that was generated by the SCSF).

SCSF Problems with VS2008 SP1

There is an issue with the 2008 version of the Smart Client Software Factory and Visual Studio 2008 SP1.  The guidance recipes don’t work correctly and won’t provide the “Add View” options etc. This is an SP1 issue only. There is a work around but it involves changing the Guidance Package Source code.

For more information see: http://www.codeplex.com/smartclient/Wiki/View.aspx?title=Known%20Issues%3a%20SC-SF%20April%202008%20with%20Visual%20Studio%202008%20and%20SP1%20Beta#RecipesNotDisplayed

The link mentions VB solutions but it is also affecting my C# solution (a VS 2008 SCSF App).

See who’s got what checked out in TFS

Run this command to see who has something checked out in TFS:

“C:Program FilesMicrosoft Visual Studio 9.0Common7IDEtf.exe” status /server:SERVERNAME /user:*  $/TFSPROJECTBRANCH /recursive /login:DOMAINYOURUSERNAME,YOURPASSWORD

Change the parent branch by replacing the TFSPROJECTBRANCH with the name of an alternative branch.

For VS2005 only machines change Visual Studio 9.0 to Visual Studio 8.

For best results pipe the results out to a file as the results are wide and are hard to read in command window.