Categories
Uncategorized

K-neighbors Counties: Finding and Mapping Similarity

If you want to go straight to the interactive maps, here they are: Population vs median income and PCA & k-neighbors. If you’re reading this on a mobile device, in their current state these maps work a lot better on desktop.

One of the nice things about visualizing data in maps is it comes with familiarity. With a graph, you might not know where you fit in, but with a map you can see where you are in the dataset.

Usually you see data mapped with a choropleth map. A US county choropleth map might look the image below.

This method is great for finding regional patterns in a one dimensional dataset. It highlights the upper and lower extremes, but it can be hard to tell the differences among the middle buckets.

A closer comparison with other places is valuable because you can apply the familiarity you have with your own area to other places.

By mapping the k-neighbors for each county, you can find which counties are most similar to each other in the dataset, without giving up the useful context that maps provide.

The scatter plot below shows US counties by population and median income. Hover or tap on the circles to see which county they represent:

US Counties by Population and Median Income

The closest neighbor to Los Angeles County California is Cook County Illinois, followed by Harris County Texas. All three contain massive cities (LA, Chicago, and Houston, respectively) so it makes sense that they are near each other in the graph above.

To turn this into a map, I scaled the above data and ran a k-neighbors model to get the nearest neighbors for each county. I used that data to create this interactive county map, where you can click the counties to see their nearest 50 neighbors in terms of population and median income. When you click a county its neighbors in the dataset are highlighted.

This kind of map gives a bottom up view, as opposed the to the top down view a choropleth map provides. It highlights places relative to each other, rather than the entire dataset at once. You could use a scatter plot like the one above, but a map provides the context of physical space.

Use Cases

This kind of data can inform decisions. With remote work becoming the standard in many fields, some people are thinking about relocating. If you like your current location, you might find this kind of map useful for finding similar places.

Another application could be for a marketing or political campaign. If certain messaging had a positive effect in one county, you might want to see which counties cluster with that one demographically when choosing the next ad buy.

Marin County and North Slope Borough

Marin County, California is an affluent area on the San Francisco Bay. If you select it on the map above, the nearest neighbors include other affluent counties with mid-size populations, like the well-to-do suburbs of Northern Virginia, New Jersey, and Connecticut.

But so does North Slope Borough, Alaska. On the far northern edge of the state, it isn’t what you think of when you think of Marin County.

It turns out that there aren’t many people in North Slope Borough, but a relatively high number of them work in oil and gas extraction, so the median income is comparable to that of the bay area and Northern Virginia.

This shows that using only two variables leaves plenty of room for outliers to mess up the results. I could add election data as third variable, but for the purpose of demonstration lets see what happens if I add not just one but six more variables.

The next map includes population and median income like the first map, but I’ve added:

  • Gini Index (a measure of income inequality)
  • Percent of population that lived in the same area last year
  • Percent of population currently in an undergraduate college
  • Percent of population that has a Master’s degree
  • Percent of population in the military
  • Percent of voters that voted Republican in 2020 general election

I’m only using those data points to show that this technique works with whatever you want to throw at it. After reducing those 8 columns down to 3 using principle component analysis, I ran k-neighbors over the resulting values.

This results in a different and perhaps better grouped k-neighbors county map. I also reduced the nearest neighbors from fifty to ten, but even with fifty, North Slope Borough was no longer included in Marin County’s set of neighbors.

Now when you select Marin County, you see other counties around San Francisco (pictured above), as well as the counties outside of New York City, Boston, DC, and Detroit.

If you select Cumberland County North Carolina, you see the other counties with a high percentage of people in the military. And if you select Oktibbeha County Mississippi, the other counties with a high percentage of college undergrads are highlighted.

Again, I’m using somewhat random data points just to show how it works, so they won’t correlate as much as a dataset that is more focused on a single topic. In fact, only 61.5% of the original dataset’s variance was maintained across the 3 dimensions in my PCA model, which is at the lower end of what I would use for something like this.

In real life you would probably use a more correlated dataset, for example if you were looking for similarities using only health or economic data. In those cases you can most likely retain more of the variance of the original set.

You can take a dataset to a relatively abstract space and bring it back to something more concrete by mapping it.

About the Data

Most of the data I used for this is from the US Census Bureau’s American Community Survey 5 Year dataset, which is ideal if your research includes geographies with smaller populations, as many US counties do.

For election data, I used the MIT Elections Dataset. One issue I ran into here is that Alaska does not count votes by county, so it is not possible to get the exact figure at the county level. I used K-nearest neighbors on the rest of the dataset to impute the percentage of Republican voters in each Alaskan county.

Here is the raw data if anyone is interested: US county k-neighbors map data.

Other tools I used for this include:

Categories
Uncategorized

Three Programming Mistakes I Made, and What I Learned

We all make mistakes, and sometimes they have real consequences. Here are three mistakes I made in programming that had consequences, and what I learned.

Publishing the wrong code

I originally published URList, a free Chrome extension, in 2017. It had been working fine for years, accumulating 75 weekly active users without me promoting it. I made it for myself, to make some tasks at work easier, but it is cool that other people are using it.

75 weekly active users is not a lot, but it is enough for you to get a one star review as soon as a broken version goes live, which is what happened.

The only reason I updated it at all was because I received a warning message from the Chrome store about how URList was using permissions that it didn’t need, namely localStorage and activeTab. Without really thinking about it, I pulled the repo and edited the extension’s manifest file so that it would no longer request those two permissions.

Silly me. The repo wasn’t up to date with the live version, and caused the extension to roll back to a previous version that lacked some features of the latest version. The live version of the code lived only on a laptop that would no longer turn on. And worse, it turned out the Chrome store’s automated message was wrong. I actually did need localStorage permissions, so removing that caused the extension to not work at all.

As is so often the case, this wasn’t just one mistake but a series of mistakes that compounded on each other to earn me my first one star review. I hadn’t been in the code for URList in years and instead of checking it to verify, I just went with what the Chrome store said. I also wasn’t as meticulous in 2017 about keeping repos up to date as I should have been.

After I realized the extension wasn’t working, I fixed it and republished it to the Chrome store. If I had taken the time to test the out of date version, and to test it without localStorage, I would have seen these problems and avoided giving people a bad experience.

The lessons I take from this are:

  1. Don’t overreact to automated warnings.
  2. Take a moment to relearn what your code is doing if you haven’t looked at it in years.
  3. Never blindly publish from a repo, assuming it is up to date. Test it locally first.

Letting bots wreak havoc

A little over 6 years ago was the first time I built a website full stack. I wanted to build a site that had entirely user generated content. Over time, people started creating pages on the site and it was really cool to see what they created.

After the site had been up for about a year, one day I noticed a massive number of pages being created at a rate of one per minute. They were all in Japanese. I ran the text through a translator and, without going into detail, it was further confirmation that I was dealing with web spam.

This is a very easily avoidable mistake, but at that time I was brand new to full stack development. The only spam protections I had used up to this point were built-in tools like Akismet. I was familiar with techniques to prevent spam, but I knew my site didn’t have a huge audience and so wasn’t concerned.

My first response to the spambot was the most hacky and unscalable approach possible. I went into my server file and added a line of code that blocked that user’s account specifically. Oh yeah, I also hadn’t coded up my own admin dashboard at that point, which would have allowed me to block a user account without having to make edits on the backend.

This worked for a couple of hours. The bot came back with a new account and started spamming again with a vengeance. Since it took so long to return, I wondered if I was dealing with a person on the other end making edits to their bot in real time.

Next, I checked my log files to see whether the IP address was the same for all these requests to my site, and it was. So, I again did the most hacky and unscalable thing and added a line of code to my server file that blocked that IP address.

Again, this worked for some time. Later that morning the bot was back. It took a little longer to come back this time, which makes me think the person on the other end needed to acquire a pool of IP addresses they could cycle through, and edited their bot to try different IPs in the pool.

Finally I added a captcha, specifically Google’s reCaptcha, for each time someone tries to make a new account. That did the trick, but a lot of avoidable damage was already done.

Since it had taken me over a day to notice the spam, there were now hundreds of spam pages on my site. The search engines had picked up some of them. For about a week my site was showing up in Google for the Japanese words for online casinos and erectile dysfunction.

The takeaways from this experience were:

  1. User generated content is a double edged sword. Your site’s quality depends not only on the quality of the content people are creating, but also on how well you ensure they are actually people.
  2. If you don’t follow best practices, spambots will eventually find a way in.
  3. Even if you don’t expect much traffic, your authentication workflow should be built as if you do.

Captchas, by the way, are not the only method to prevent spam. It was just the simplest solution in this case. Automated IP blocks, user moderation, and rate limiting, for example, are other ways to prevent this from happening.

I have to admit it was fun hacking against somebody out there in the world. It made my day interesting. I hope that person is doing okay.

Crawling impolitely

Treating mistakes as a necessary, even harmonious, part of business is cliche now, e.g. “fail faster,” and “move fast and break things.”

That mindset, while useful, is a privilege for those who are the ones building a product.

Those who are working on someone else’s product, on the other hand, may find it harder to take their own mistakes in stride because it could result in getting fired. This next mistake didn’t get me fired, but it could have.

I’ve built plenty of web crawlers, and I’ve come up against plenty of automated systems designed to prevent crawling websites. Using VPN, switching between a pool of proxy IPs like my friend above, spoofing User Agent, using a headless browser — these are all useful to avoid getting your bot blocked.

But the best way to not get blocked is the simplest: slow it down.

Or better yet, slow it down for varied lengths of time, e.g:

def humanize(x,y):
	n = round(uniform(1,10))
	if n <= 4:
		d = uniform(x,y)
		sleep(d)
	elif n <= 7:
		d = uniform(x,y*2)
		sleep(d)
	elif n < 10:
		d = uniform(x*2, y*3)
		sleep(d)
	else:
		d = uniform(x*3, y*10)
		sleep(d)

Slowing your crawlers down not only makes them appear more human, which helps you go undetected, but also reduces the load on the target site’s servers. By reducing the server load you are, as they say, crawling “politely.”

Generally speaking you can make more requests per second for sites that have high traffic, because they have the servers available to handle more requests. They still may use automated IP blocks, however, so even though their servers have plenty of capacity, they may still block you if you don’t use proxies and rotate them.

All that to say, when I made this mistake, I was well aware that it is a best practice to crawl politely and that I shouldn’t crawl from my real IP address.

I did neither of those things.

I wanted to get to the deliverable as soon as possible, and I knew that my client’s website was massive (in both the number of pages and traffic to those pages) so I erred on the side of more requests per second.

Lo and behold, my client’s servers stopped responding to my requests. I figured this was just a temporary block. I would simply slow my crawler down and restart it later in the day there would be no issue.

Nope. And not only that, there were other people in my company working on the client’s site and they couldn’t access it on their laptops. Oops.

They would have to go through a VPN to view their own client’s website. The wifi was shared among the whole building, so probably nobody in the building could access this site either, which theoretically could hurt my client’s sales. But realistically I doubt it had any financial impact.

So at that point I was hoping it was a 24 hour block, but really I had no way of knowing that. I was just thinking positive. I told people it was my fault and that we’ll either come back in tomorrow and it will be like nothing happened, or I would need to get on a call and have them manually remove our IP address from their blacklist.

The next day, still nothing. So, hat in hand, I got on a call with their dev team. I explained my mistake, gave them our IP address, and they whitelisted us.

The things I take away from this experience:

  1. If you can, have your clients whitelist your IP address and/or user-agent (and use a unique UA that identifies you) if you plan to crawl their sites.
  2. Err on the side of being too polite to web servers, even if you’re in a hurry.
  3. Never use your real IP address when crawling. Always use a proxy, or just use a VPN.

I’m sure I’ve made other mistakes, and I’m sure I will make more. The important thing is that we learn from our own mistakes, and that we share our experiences so we can learn from each other. What are some mistakes you’ve made, and what did you learn?

Categories
Uncategorized

SEO in 2092

OK so you’ve made it through the hydrogen wars of the 2080s, life is an anarcho-libertarian fever dream space opera, and you’re wondering, “do title tags still matter?”

I’ll be frank here, manually labelling and categorizing information is so 2064. Not only has the user’s decision about what to search been outsourced with proactive contextual inferencing, so has the content producer’s decision about how their site is structured/interlinked.

That means that in a given user session, the only constant is the corpus, and everything else reflows. The corpus here means the entirety of information you have – not just text, but also images, video, even your iframe holograms. On each interaction with your user, you need your corpus to reflow, automatically structured via your lightweight improvisational context tree, or LICT.

Atwood’s Law, which states that any application that can be written in Javascript eventually will be written in Javascript, is still being proven correct, and your Lictionary is no exception.

The above chart is old news to most SEOs but it still holds true today.

So you’re thinking — that’s all interesting, but what action items can I recommend for my client’s crypto-law website? Ok, let’s talk conversion rates.

I’d be remiss if I didn’t mention the people of Mars. Studies show that many of them make up the 1% of Internet Explorer users. Make sure you’re using polyfills so your automated sensory conversions work correctly for them. I noticed a 20% increase in Martian conversion rate after optimizing my client’s terraform kits website for IE.

If you need help with polyfills, you can find plenty over on PolyfillOverflow. Yes, its a pain to have to significantly expand your codebase just to cater to a few IE users, but because everyone else is skipping it, thinking IE will die, is exactly why you should do it.

The corrective inference array provides users results mixed with what they think they want and what they should want in order to get by with a hardscrabble life in the plasma mines.

Let’s face it, 80% of your users are mining plasma 16 hours per day. They don’t need to waste their precious free time deciding which phase reverter site provides the best product specs, they just need phase reverters that don’t crap out in the middle of the rebellion.

In my opinion, too many SEOs are over-focused on getting those corrective inference results. No disrespect to the mighty Algo, hallowed be thy name, but there are still conversion opportunities in those search results that aren’t correctively inferred.

Yes, the correction array will catch those users eventually, and they will be arrested and brought before the Emperor’s Tribunal for their thought crimes, but that doesn’t mean you can’t make your phase reverter client some money along the way. Just sayin!

Anyway these are my thoughts for what it means to be an SEO in 2092. If you found this useful, please share.

Hey, thanks for reading. No, I was not on any drugs while writing the above. It was just a bit of fun. I was inspired by sportswriter Jon Bois’ delightfully disorienting speculative fiction, Football 17776. I encourage you to check it out if you like this kind of thing.

Categories
Uncategorized

How to Mail Merge with Python

Let’s break mail merge down into its component parts:

  • a text document which contains the body of the email, and is populated with data
  • a spreadsheet that contains the data to be populated into the email body
  • email, specifically, one email sent for each row of the spreadsheet data

Those are all things you can code up on your own with Python efficiently, especially if you lean on a couple of libraries.

Email Body

To craft your messaging, use a triple-quote string so that spacing will be preserved. Each part that will be populated by data can be replaced with curly brackets. You’ll use Python’s string.format() method to fill in the fields from the dataset.

message = """Good morning {},

Thanks so much for your help with {}. If there's anything we can do, let us know.

Thanks,
  Jake

"""

Populate the Data

You can import the csv library to and use it to pull your data from a .csv file.

import CSV
def get_from_csv(file_name):
    with open(file_name, 'r') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data

data = get_from_csv("your_spreadsheet_file_name.csv")

Sending the Email

For this we’ll import a python library called smtplib, which stands for simple mail transfer protocol library. In our setup we’ll use a Gmail account, and SMTP will take the data we give it and send the emails. For this to work, you will need to temporarily adjust your Gmail settings to allow less secure apps access.

For each row in the spreadsheet, you can use a function like this:

def send(user, pwd, recipient, subject, body):
	FROM = user
	TO = recipient if type(recipient) is list else [recipient]
	SUBJECT = subject
	TEXT = body

	message = """From: %s\nTo: %s\nSubject: %s\n\n%s""" % (FROM, ", ".join(TO), SUBJECT, TEXT)
	try:
		server = smtplib.SMTP("smtp.gmail.com", 587)
		server.ehlo()
		server.starttls()
		server.login(user, pwd)
		server.sendmail(FROM, TO, message)
		server.close()
		print('successfully sent the email')
	except Exception as e:
		print('failed to send mail, because of exception:')
		print(e)

Put It All Together

The full code is below. The comments will help you understand what each line is doing. I haven’t modularized anything here, but I recommend modularizing the send function so you can import it into other Python scripts.

from time import sleep
import random
import csv
import smtplib

message = """Good morning {},

Thanks so much for your help with {}. If there's anything we can do, let us know.

Thanks,
  Jake

"""

def get_from_csv(file_name):
    with open(file_name, 'r') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data

data = get_from_csv("your_spreadsheet_file_name.csv")

def send(user, pwd, recipient, subject, body):
	FROM = user
	TO = recipient if type(recipient) is list else [recipient]
	SUBJECT = subject
	TEXT = body

	message = """From: %s\nTo: %s\nSubject: %s\n\n%s""" % (FROM, ", ".join(TO), SUBJECT, TEXT)
	try:
		server = smtplib.SMTP("smtp.gmail.com", 587)
		server.ehlo()
		server.starttls()
		server.login(user, pwd)
		server.sendmail(FROM, TO, message)
		server.close()
		print('successfully sent the email')
	except Exception as e:
		print('failed to send mail, because of exception:')
		print(e)

for row in data:
	if row is data[0]:
		continue
	#this depends on what columns the first name, email, and project info are in. In this case, colums B, C, and D:
	first_name = row[1]
	email_address = row[2]
	project = row[3]
	#now add those 3 inputs to your email body:
	message_with_inputs = message.format(first_name, project)
	print("emailing", email_address)
	#send the email:
	send('your_email_address@gmail.com', 'your_gmail_password', email_address, 'Thanks ' + first_name, message_with_inputs)
	#sleep for variable amounts of time if needed:
	sleep(random.randint(3,6))
Categories
Uncategorized

Teaching My First Improv 101 Class

Last Friday the Improv 101 class I’ve been teaching had their class show. They absolutely crushed it. I’m so proud of them. I’m lucky to have taught such a fun, smart, and just plain nice group of folks. Thank you Brandon, Frank, Allie, Jennifer, Jeff, Thaddeus, Rose, McNeil, Maggie, and Olivia. You’re all improvisers.

If you’re in the Durham area, check out an improv class!

Categories
Uncategorized

Navigate a List of URLs with URList

I recently published two new Chrome extensions, one of which is URList. It allows Chrome users to navigate their own list of URLs one at a time.

What is this for?

At work lately I’ve had a few tasks that involved manually checking a list of URLs from an Excel file. I was copying and pasting them one by one from Excel into the URL bar, which is just as lame as it sounds. I figured there must be some Chrome extension I can use to speed things along. I wanted something where I could paste all the URLs at once and navigate to each by clicking a button.

The closest solutions I found were extensions that opened a new tab for each URL, but that is a massive waste of RAM. If I need to peruse a dozen (let alone several dozen or several hundred) URLs, it would be ridiculous to use all that memory up front. I wanted a “just in time” solution.

How it Works

So I set to work on URList. Once you’ve added it to Chrome, you can type or paste your list of URLs into the text box and click “Start” to save them and then click “Next URL” to navigate to each web page in the list in the order you added them.

While that is the basic function of it, I added a couple more enhancements. First, I wanted a little bit of data persistence. To that end, the extension saves the URLs to localStorage so that you can exit Chrome and come back to your list later. It will pick up right where it left off in the list.

I also added a “Hide List” button, which allows users to view only the navigational buttons, and made it so that users can enter their URLs in multiple ways.

Use Cases

Use cases for this extension go beyond what I designed it for. I could see folks using it to quickly navigate through their favorite news sites, for example. The URLs are saved in localStorage, so they will already be there. I could also see it being used for a quick, browser-based slide show.

End Notes

The nice thing about this project is that it provided me a refresher for core Javascript. Lately at work I’ve been using Python quite a bit and not nearly as much Javascript as I was a few months ago.

Try it out! Give it a good review if you like it. You can find it in the Chrome store here.

The other Chrome extension I published recently is WikiPik, which displays images for Wikipedia articles that don’t have any.

Categories
Uncategorized

Quizzly Updates

I’ve been working on other projects, including a dating app, but I recently decided to make improvements to the site I launched last year.

Last year I wanted to make a website that does one thing very well. With that in mind, I launched Quizzly – a site that people can use to make their own multiple choice quizzes. I knew that there were already a lot of quiz sites out there, but in my opinion they were all aimed at personality quizzes. Fun, if you’re into that kind of thing, but highly subjective.

I wanted to make something that teachers, students, and anyone else can use as a study aid. More in that realm lie sites like Sporcle, which are fantastic for trivia. They offer a wide variety of quiz types, but the interface is a little crowded in my opinion.

I wanted something clean and minimal, so there would be no distraction while taking quizzes. And again, I wanted something that did one thing very well, which to me meant having only one quiz type – multiple choice.

Last Fall, I believed I had a product that wasn’t perfect, but it worked well. Users could make their own multiple choice quizzes and users could take them, get a score, and see which answers they missed.

I then completely ignored the site for about six months. I worked on other projects, one of which is a dating app that matches based on the bands you like. More on that project soon.

As Quizzly started to get more traffic and people made their own awesome quizzes, I took another look at it and felt it needed some updates. So I made a list, started the local server, and got to work. The changes include, but aren’t limited to:

Switching to HTTPS

Making the site more secure was the top priority. While not necessary for static sites (this blog, for instance) Quizzly has user logins and user-generated content. Furthermore, most browsers flag pages delivered without HTTPS and give users scary sounding messages. You don’t want people to be scared of your website. To fix this, I purchased an SSL certificate, changed the domain name servers, and in my application server, redirected all requests for HTTP to HTTPS.

Switching to non-WWW

This was purely an aesthetic choice. Most site visitors probably wouldn’t notice whether the URL starts with “www” or not, but to me it looked ugly. The TLD is “.co”, which looks like the site is going for the whole brevity thing. Having the “www” countermanded that, so I decided to switch to having a non-www, or “naked,” domain.

Style Improvements

While I wanted something minimal, I didn’t want it to look boring. The style improvements were pretty minimal, and in fact this is an area that will get more attention. The trick here is for the site to look like something that users can interact with.

User Management

Stormpath was a good user management service, and Quizzly relied on their API for user authentication and management. They shut down their API yesterday, because they joined Okta. I checked out Okta but it looked pricey. I ended up choosing Firebase, a Google product, for user authentication and management. User management and authentication is an area where you shouldn’t reinvent the wheel if you don’t have to.

SEO Edits

I took a tip from the big ecommerce sites and added canonical link tags on my sort pages. Every quiz topic, i.e. Astronomy or Film, on my site has 10 URLs associated with it, each of which renders the quizzes in that topic using a different sort method. Now, all 10 of those topic pages reference a single sort page. That gives the search engines a better idea of which sort page to index.

Quiz Images

The original way users added an image to their quiz was by via an image URL. I stored the URL in my database and rendered the quiz with that exact URL. That is a bad idea. I admit it was just a quick hack because I didn’t yet want to spend time figuring out where to host images. Now, users upload images from their own device, they are stored in the cloud, and delivered via CDN. I’m using Cloudinary for image storage and couldn’t be happier with it.

Slugs

Spaces are fine in URLs, but they do render as “%20”. In order to make the links to my site more human-readable, I wanted to make sure that all spaces were replaced with a dash. I couldn’t just write a redirect, because my database queries come from the URL. A query for a quiz with the spaces changed to dashes would fail.
I likewise couldn’t change the URL’s dashes to spaces just for the query because of the cases in which a quiz title is supposed to have a dash in it. As in, a real dash. Not a slug dash.
First, I wrote a function that turns each user’s quiz title into a slug. A quiz titled, “Hemingway’s Novels”, for example, would have the slug, “hemmingways-novels”. I did the same for all user topic tags. For the tags, I added a “tag” collection in my database so that tags can be queried faster, without needing to use any aggregation pipeline to dig through the “quizzes” database. The tag collection consists only of tag names, e.g. “American Authors”, and their corresponding slugs, e.g. “american-authors”.

Categories
Uncategorized

Westworld Podcast – New Episode!

I’ve been a frequent guest on my friend Craig Carter’s Westworld podcast over the past year or so. He started this project with Heather Barefoot and Jonathan Yeomans, and Jonathan is the guest in the episode that came out today.

They talk about the three new Westworld cast members and the fact that Jimmi Simpson will return next season. That means more scenes from the past, which is intriguing. They also give a preview of next week’s discussion of Jurassic Park, which has a lot in common with Westworld, namely that it is about a theme park that goes haywire, causing the visitors to become prey.

I love this podcast, and not just because I’m on it. It has some great interviews with actors from the show, original player piano music from Alex Thompson, and reviews of other titles in the Western and Sci-Fi genres. Its been super fun working with Craig on this project, as well as the other guests of the show (Brian Sutorius, Heather Barefoot, Wil Heflin, and Jonathan Yeomans).

If you’re looking to sate your Westworld fix for the long haul between now and season 2, then this is the podcast for you.

Categories
Uncategorized

Palindrometer: The Twitter Bot that Finds Palindromes

I wrote a bot recently that searches tweets for palindromes. A palindrome is any word, phrase, or sequence of numbers that is the same when written backwards. The number 101, the town of Wassamassaw, SC, the word “madam”, and the band name ABBA are all palindromes. The most famous one is, “A man, a plan, a canal – Panama.”

This isn’t the first Twitter bot I’ve written, but it is the first one that I feel is interesting enough to share. You can take a look at it on Twitter to see what it is up to.

I set it up so that it only finds multi-word palindromes (so “Hannah,” “Anna,” “mom”, and “dad” are all out unless they are part of a larger palindromic phrase) and they must be 9 characters or longer, excluding spaces. That way its activity is somewhat throttled and the quality of palindromes found is higher. Theoretically. This is Twitter we’re talking about.

Why is this something that exists?

Purely for fun. Given enough time, the bot could find the next, “A man, a plan, a canal – Panama.” That would be pretty cool. Since I last checked it this morning it has retweeted tweets that include:

  • “forever of”
  • “never even”
  • “did it, I did”
  • and my favorite, “dammit I’m mad”

For now I hardcoded those into the bot so that it doesn’t repeat them, but when I get to it I will hook a database up to the bot so that it can add found phrases to the database and then check new ones against that set so it doesn’t repeat itself.

How it works

The fun part for me was writing the code that parses tweets and then finds symmetry across multiple words in the tweet. First, the bot parses each Tweet it can get (it can’t get all Tweets) by removing any punctuation, multiple spaces, and capital letters. That leaves it with just the words and numbers in the tweet.

Next it puts each word or number into an array, and from that array creates a new array of every possible combination of two or more sequential words or numbers. For example the 4 word tweet “hey what is new,” would be broken up into these 6 segments: “hey what,” “hey what is,” “hey what is new,” “what is,” “what is new,” and “is new.”

The bot then runs a function on each segment that looks for symmetry. That function, as you might have guessed, starts with the first and last character of each segment and works its way to the middle character (or pair of characters if the segment contains an even number of total characters) checking for matches. If they all match, then there is symmetry in that segment and the bot has found a palindrome.

Categories
Uncategorized

Embeddable Quizzes

I made a small update* to Quizzly today that will make it much easier for bloggers to embed quizzes into their site. Let’s say I have a blog about coffee. I can embed a quiz into my blog post, which will make it more interactive and thereby increase the amount of time visitors spend on my site – an important engagement metric. After a few paragraphs of content about coffee, I might place the quiz here:


At this point in your blog post it is a good idea to engage with your visitors about their score. Encourage them to post their scores or discuss the quiz in the comments. You can create quizzes specifically for your blog post on my quiz maker site.

*Specifically, I added in a few lines of JavaScript to the quiz pages that checks if it is in an iframe, and if it is, removes all other page elements except the quiz itself. I also added a “get embed code” button at the end of each quiz.