On Application of Source Code Analysis Ttechniques to HTML Pages Data Extraction

Dmitry Orlov
Web scraping technique becomes more important as data grows in the Internet. There are lots of algorithms developed, most of them requires human assistance. The proposed approach using source-code analysis techniques for extracting data from HTML pages. The extracted information is divided into fields, which are parameters of extracted entities.