Results 1 to 5 of 5

Thread: Parsing HTML -ClanLib right for the job?

  1. #1
    Lesser Knight
    Join Date
    Sep 2006
    Posts
    41

    Default Parsing HTML -ClanLib right for the job?

    I need to parse a series of Web pages for specific information. Structurally, there's one master index page, which contains readily identifiable links to sub indices. From these sub indices, I get the locations of the final pages, which need to be parsed for information. What I want to know is, would ClanLib's networking and XML tools be appropriate for this?

    And does anyone know of some good tutorials that could help me get started in this area?

  2. #2

    Default Python

    I'm sure the XML facilities in clanlib could be used to do this, but there are MUCH easier ways of doing this.

    For one, try the HTML facilities in Python. They are very robust and feature rich. Plus you get the added bonus of programming in Python.

    If you MUST use c++ to do this, then maybe libwww is your cup of tea.

  3. #3
    ClanLib Developer
    Join Date
    Sep 2006
    Location
    Bergen, Norway
    Posts
    588

    Default

    If you don't need any visuals and want to use ClanLib, I'd look at the 0.9 branch. The XML stuff is better, and Magnus has coded in quite alot of html helper classes. Not sure they help you, but worth checking out

  4. #4
    Lesser Knight
    Join Date
    Sep 2006
    Posts
    41

    Default

    Well, it's not that I *must* use C++, it's just that I've never gotten around to learning Python

    I figure I'll just shell out to wget in order to grab the files, so all I really have to do is parse the html. Thanks Sphair, I'll try the 0.9 branch to see what's there.

  5. #5
    ClanLib Developer
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    554

    Default

    The web stuff in 0.9 is mostly about sending the HTTP requests and dealing with SOAP messages. The DOM classes in 0.8 or 0.9 is not very suitable for parsing HTML, since HTML is not XML and allows unclosed tags.

    Generally there are specialized languages that handle tasks like parsing HTML much better (perl for example). If you must parse HTML output in C++, the best method is most likely to use a regular expression library, such as PCRE. ClanLib 0.9 has classes that wrap this library, but 0.9 isn't complete yet, so if you chose that path you may end up having to fix things in ClanLib.

Similar Threads

  1. Video in ClanLib
    By d_oilen in forum Official ClanLib SDK Forums
    Replies: 2
    Last Post: 06-04-2007, 02:52 PM
  2. Clanlib physics
    By pTymN in forum Official ClanLib SDK Forums
    Replies: 4
    Last Post: 05-11-2007, 04:19 AM
  3. Help Installing clanlib
    By dgilla in forum Official ClanLib SDK Forums
    Replies: 5
    Last Post: 01-03-2007, 07:29 PM
  4. Help! installing ClanLib
    By thfai2000 in forum Official ClanLib SDK Forums
    Replies: 2
    Last Post: 12-21-2006, 09:28 AM
  5. Clanlib projects site ?
    By seby in forum Official ClanLib SDK Forums
    Replies: 3
    Last Post: 12-06-2006, 09:35 AM

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •