{"id":985,"date":"2003-10-15T09:40:45","date_gmt":"2003-10-15T09:40:45","guid":{"rendered":"http:\/\/beta.robertprice.co.uk\/robblog\/2003\/10\/xml_parser_problem_fixed-shtml\/"},"modified":"2003-10-15T09:40:45","modified_gmt":"2003-10-15T09:40:45","slug":"xml_parser_problem_fixed-shtml","status":"publish","type":"post","link":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/","title":{"rendered":"XML::Parser Problem Fixed"},"content":{"rendered":"<p>\nI was having serious problems using the <a href=\"http:\/\/www.perl.com\/\">Perl<\/a> module <a href=\"http:\/\/search.cpan.org\/dist\/XML-Parser\/\">XML::Parser<\/a> to extract data from an <a href=\"http:\/\/www.xml.com\/\">XML<\/a> file.\n<\/p>\n<p>\nThe file was several megabytes in size, but every so often the content of one element would not extract correctly. I was convinced the problem must of been with <a href=\"http:\/\/search.cpan.org\/dist\/XML-Parser\/\">expat<\/a>, the C parser behind <a href=\"http:\/\/search.cpan.org\/dist\/XML-Parser\/\">XML::Parser<\/a>.\n<\/p>\n<p>\nIt turns out I should of read the manual a little better.\n<\/p>\n<p>\nI was resetting a variable in my Char handler each time the handler was called. I was assuming this would only be called once per element. However, it can be called multple times per element and this is what was happening.\n<\/p>\n<p>\nFor my small test file, the original code was fine. However, resetting the variable in the Start handler and making sure that the Char handler now appends data each call has fixed the problem for all files, including very large ones.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was having serious problems using the Perl module XML::Parser to extract data from an XML file. The file was several megabytes in size, but every so often the content of one element would not extract correctly. I was convinced the problem must of been with expat, the C parser behind XML::Parser. It turns out &hellip; <a href=\"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;XML::Parser Problem Fixed&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[6],"tags":[],"class_list":["post-985","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>XML::Parser Problem Fixed - Robert Price<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"XML::Parser Problem Fixed - Robert Price\" \/>\n<meta property=\"og:description\" content=\"I was having serious problems using the Perl module XML::Parser to extract data from an XML file. The file was several megabytes in size, but every so often the content of one element would not extract correctly. I was convinced the problem must of been with expat, the C parser behind XML::Parser. It turns out &hellip; Continue reading &quot;XML::Parser Problem Fixed&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/\" \/>\n<meta property=\"og:site_name\" content=\"Robert Price\" \/>\n<meta property=\"article:published_time\" content=\"2003-10-15T09:40:45+00:00\" \/>\n<meta name=\"author\" content=\"rob\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"rob\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/\"},\"author\":{\"name\":\"rob\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/#\\\/schema\\\/person\\\/fac6d5b076e0e14e1fb13e15b542a6c5\"},\"headline\":\"XML::Parser Problem Fixed\",\"datePublished\":\"2003-10-15T09:40:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/\"},\"wordCount\":152,\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/\",\"url\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/\",\"name\":\"XML::Parser Problem Fixed - Robert Price\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/#website\"},\"datePublished\":\"2003-10-15T09:40:45+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/#\\\/schema\\\/person\\\/fac6d5b076e0e14e1fb13e15b542a6c5\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/xml_parser_problem_fixed-shtml\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"XML::Parser Problem Fixed\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/#website\",\"url\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/\",\"name\":\"Robert Price\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.robertprice.co.uk\\\/robblog\\\/#\\\/schema\\\/person\\\/fac6d5b076e0e14e1fb13e15b542a6c5\",\"name\":\"rob\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g\",\"caption\":\"rob\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"XML::Parser Problem Fixed - Robert Price","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/","og_locale":"en_GB","og_type":"article","og_title":"XML::Parser Problem Fixed - Robert Price","og_description":"I was having serious problems using the Perl module XML::Parser to extract data from an XML file. The file was several megabytes in size, but every so often the content of one element would not extract correctly. I was convinced the problem must of been with expat, the C parser behind XML::Parser. It turns out &hellip; Continue reading \"XML::Parser Problem Fixed\"","og_url":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/","og_site_name":"Robert Price","article_published_time":"2003-10-15T09:40:45+00:00","author":"rob","twitter_card":"summary_large_image","twitter_misc":{"Written by":"rob","Estimated reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/#article","isPartOf":{"@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/"},"author":{"name":"rob","@id":"https:\/\/www.robertprice.co.uk\/robblog\/#\/schema\/person\/fac6d5b076e0e14e1fb13e15b542a6c5"},"headline":"XML::Parser Problem Fixed","datePublished":"2003-10-15T09:40:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/"},"wordCount":152,"articleSection":["Uncategorized"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/","url":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/","name":"XML::Parser Problem Fixed - Robert Price","isPartOf":{"@id":"https:\/\/www.robertprice.co.uk\/robblog\/#website"},"datePublished":"2003-10-15T09:40:45+00:00","author":{"@id":"https:\/\/www.robertprice.co.uk\/robblog\/#\/schema\/person\/fac6d5b076e0e14e1fb13e15b542a6c5"},"breadcrumb":{"@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.robertprice.co.uk\/robblog\/xml_parser_problem_fixed-shtml\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.robertprice.co.uk\/robblog\/"},{"@type":"ListItem","position":2,"name":"XML::Parser Problem Fixed"}]},{"@type":"WebSite","@id":"https:\/\/www.robertprice.co.uk\/robblog\/#website","url":"https:\/\/www.robertprice.co.uk\/robblog\/","name":"Robert Price","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.robertprice.co.uk\/robblog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.robertprice.co.uk\/robblog\/#\/schema\/person\/fac6d5b076e0e14e1fb13e15b542a6c5","name":"rob","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6f0eb511179100a4e968abc70403e33686e6ab3e992e392bedd2ccac01da666c?s=96&d=mm&r=g","caption":"rob"}}]}},"_links":{"self":[{"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/posts\/985","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/comments?post=985"}],"version-history":[{"count":0,"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/posts\/985\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/media?parent=985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/categories?post=985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robertprice.co.uk\/robblog\/wp-json\/wp\/v2\/tags?post=985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}