I am trying to parse https://api.lever.co/v0/postings/matchgroup?mode=xml but I am getting the error lxml.etree.XMLSyntaxError: CData section not finished
. It seems like the issue is being caused by the data having Korea characters.
import lxml.etree import io import requestsurl = "https://api.lever.co/v0/postings/matchgroup?mode=xml"r = requests.get(url)f = io.BytesIO(r.content)parser = lxml.etree.XMLParser(recover=False) tree = lxml.etree.parse(f,parser) # Raises lxml.etree.XMLSyntaxError
I can change recover to True
but then some of the entries would be missing.