Relative Content

Tag Archive for pythonjsonpipeline

Using python to convert markdown to json

First disclosure, I am not a programmer. I need a md to json pipeline because I am preparing documents to finetune an LLM. I thought I could use GPT 4o to write it for me but to no avail, so far, after four straight days of GPT generating new code and testing it locally in VS Code. Writing an extraction script seems pretty straightforward but it’s not happening so I thought I’d ask here for help. I have a lot of documents to process and it is not possible to do this manually. I have clear requirements, a script that worked ok but not optimally, a test md file, expected output json and logs. Here’s the py code and then the expected output in json, and the md test file.
PYTHON