Bad Code

The XML Formatter I wrote in Python is horribly inefficient. Not only does it pass the whole list of xml tags to every XMLTag class instance, but it also explicitly lists rules for content both before and after each tag. This was a “clever hack” when I wrote it, but it occurs to me that I should never have to check what is before each tag, only what is after, and this would greatly simplify my code.

I am reading Clean Code. It’s kicking my butt. But I am excited, because learning these principles is a huge step forward in my career.

Scott vs. Regex

I built a blog several years ago to record my lessons-learned in programming, so although I built this blog to record my findings on computer security, I’m still in the business of programming, and am still finding a lot to say. That other blog is way behind, and I want to consolidate, so I’ll be talking about lessons in programming here as well.

One of my ambitions with programming is to become a regex master. Text parsing is one of the funnest things about programming. You may find that strange, but I know I am not alone.

Anyway, I was working on my XML Formatter and was inserting functionality to check for self-contained tags, such as <br />. I was getting very frustrated because my expression simply was not matching any self-contained tags, which were always getting the status of ‘text’ and getting their tag markings stripped out. (I have four classifications – open tags, closed tags, self-contained tags, and text)

My regex worked fine on some of the Javascript regex testing websites, so I figured there must be some subtle difference between how Python and Javascript parse regular expressions. But then I tried doing some tests in the Python command line and those appeared to be working! It must have been because I was using findall to match multiple expressions at once, and something was getting lost in the match-by-match.

I went through the gamut of looking online and finding that regex is not a great way to build an XML parser, but I was not building this to parse XML, I was building it to format XML, because my inner designer hates seeing poorly formatted XML and HTML. But maybe the people were right and I was going to find myself in a forest dark where the straightforward path had been lost. I think I put it off fixing it for a whole week.

Until this morning. After about an hour, I decided that something must be correct about my expression, so maybe the problem was further up.

Then I saw it.

breakdown = (re.findall(OPENSIG + ‘|’ + CLOSESIG + ‘|’ + ‘CONTSIG’ + ‘|’ + TEXTSIG, testString))


I had put CONTSIG in quotes. Making it a string literal. Causing the pattern to never match my test string.

Well, here’s my sign.

I’m hoping today to finish up the last piece, allowing for file input. But here’s to nearing the very end of a project!