Brace Yourself! — Product Querying with Python Sets

Imagine you are car shopping and want a vehicle that is an extended-cab pickup truck with less than 100,000 miles and is either black or red.

Obviously not every vehicle in a dealer’s lot (the “collection”) is going to fit this description, but we can define a list of requirements that determine whether any vehicle is a match.

Filtering — in the context of programming — is the common need to reduce a collection of things to a smaller collection of things based upon some condition(s).

This is similar to when discounts are applied to everyday items we buy, tracking a package that is out for delivery, deciding what TV shows to recommend a customer, and countless other daily examples.

Formally, we call the list of all items a collection, the list of requirements a query, and the group of items that match the query a subset. We often think about queries as relating to databases, but we can also create a query system in Python because, after all, we are data scientists and this is 2019.

The Python Set

Python’s provides a really nice blank canvas for creating a product filtering system as it is an efficient container for storing unique elements inside curly braces {} . Unique elements are important here because we don’t want to have duplicates in our results.

In addition, they support basic mathematical set operations. For example, the intersection of corresponds to their two sets overlapping elements like so:

Similarly, the union gives us everything from A or B:

A full list of operations Python can be found here. These make it really easy to combine the results of simple building blocks into complex queries.

Building a Product

First we define what a product is — in this case just a Plain Old Data (POD) class that stores product attributes.

I’ll use a tire as an example because this a blog created by a company with the word “Tire” in the name that makes money by selling tires. Also, I have the creativity of a potato.

Our simple product has just three attributes. If you are unfamiliar with the so-called “magic” methods that are surrounded by , just know that they are special functionality that you can add to your custom classes to make them easier to use. Defining an method allowing you to compare two classes with , provides a necessary mechanism for creating a set of tires (sets are implemented as a hash table similar to Python dictionaries) and defining provides a nice representative display for our Tire objects when they are outputted, e.g. the following code:


which is much more useful than the default and it can be copied and pasted to create a new, identical object.

Now let’s create our collection of tires that will be used for filtering.

The query system will be built based on a template. This template has the same attributes as a Tire, but can also contain multiple values as well as wildcards, and it implements a method for determining if a Tire matches the template.

TIRE-d yet? (Sorry, I couldn’t help myself.)

Before going further, I’d like to point out this code has been simplified, and I’ve skipped things like input validation, using an enumerable type for , etc. which are all steps that should be taken in a production system.

Now let’s build a couple of helper functions that will look up tires based upon brand, style, or winter designation. expects an iterable for each of its properties, but that’s annoying to have to remember if you only want to filter by one brand.

So here I will take the time to add helpful input validation by converting the inputs to a list if they are not already.

Set comprehension works the same way for Python lists, so the code is nice and clean and reads almost like English:

“Make a template and return each unique product in ‘Products’ if it matches that template.”

Using the filter system

Now we can filter products efficiently by combining these three helper functions. If, say, we want to find all of the Michelin products, we can do that with .

To get multiple brands, just pass multiple arguments to the filter.

A more complex example would be to search for Michelin winter tires.

Using the functions we have, we can either create a set intersection with the operator or pass the result of one function call as the collection to the second. The former is easier to read (especially if there are many similar operations chained together); however, the latter has the performance advantage that each subsequent filtering is only performed on the (comparatively smaller) result from the previous filter rather than the entire collection.

The clever reader might have realized I could have also gotten all winter Michelin tires at once by creating another that matched both of those simultaneously, and you would be absolutely right.

However, in a complex system you might be combining results from several different moving pieces layered on top of each other.

For example, imagine you are building a results screen for your product search tool and have a dropdown for the winter designation. There may be a performance advantage to calculating each piece separately because it makes it easy to toggle the result sets, whereas if you calculated everything at once you will have to redo the full calculation if there is a change.

Both ways are valid in different circumstances, and part of being a good developer is understanding these types of implications so you can make the right decision for your project.


This was a fun example of using Python to build a query system. Extensions of the basic idea demonstrated here are exactly the same that are used by data scientists at ATD’s CoE for complicated tasks like matching reward program logic to dealer inventories. Best of all, we got to use the Python , a highly underrated tool to integrate into your workflow.

Who we are as people is who we are as a company. We share the same values in the way we work with each other, our partners and our customers. We are ATD.