Document Title

Gordonator Precision Search Engine (GPSE)
How does the Gordonator Precision Search Engine (GPSE) work?

Before I touch on this, I have to explain the nature of the problem:

Right now, when you're searching for a new book, if you know the title or author, searching is easy, whether you're in a bookstore or online. But what if you're searching for a new book to read but don't have a particular book in mind?

If you're in the bookstore, you go to the appropriate section, and read the back of one book after another. But this can take a lot of time, and the descriptions on the back of books may not be, well, very descriptive. After a long search, you may end up buying a book that you never read.

If you're online, you can search by keywords for kinds of books. Unfortunately, this is not a very good method either: (1) keywords are very broad and general, and cannot express complex concepts such as "dating someone on a gag or bet" or "saving a kingdom from overthrow" (2) users may not use the same keywords as the people who categorize the books--e.g., someone may search for a book with "giant monsters" and miss books that are categorized as having "large monsters". Without seeing a list of all possible keywords, users may not know of available options.

What is needed, then, is a system that can express concepts and express them in a finite way where one can see all the available options. That would, for the first time, truly let you browse by kinds of plots, characters, settings, and styles.

Now, back to the original question.

How does the Gordonator Precision Search Engine (GPSE) work?

The Gordonator Precision Search Engine (GPSE) seems merely complex, but is actually extremely complex. The basic idea is to make finite categories which describe the plot, character, setting, and style of a book or movie and list those categories to users and reviewers in a logically organized hierarchical system. By making the categories finite and presenting them to the user, the user is helped because he is presented with the same list that the reviewer of the book used when the book was categorized. Also, by presenting the categories to the user, the list of options may give search ideas to the user that the user hadn't before considered.

So, then, all the system requires is for one to figure out every attribute of plot, character, setting, and style and make them available for search, right?

Not quite. There is some additional complexity involved.

Category level of detail needs to be proportional to the number of reviews in that particular area: The description of categories is not black and white. For example, should "hunting for monsters" be a category, or should "hunting for water monsters", or should "hunting for sharks"? All could be considered plot descriptions, but which one should the engine offer?

The correct answer depends on the volume of reviews for these particular kinds of plots. If there are very many reviews about hunting monsters, then it would be appropriate to have specific subcategories for sharks, squids, etc. That way the many books on this subject can be properly differentiated from each other. If there are only a few books on the subject of hunting monsters, then there is little need to have subcategories beyond the general "hunting for monsters".

This also implies that the level of detail of categorization needs to be constantly changing based on a close monitoring on the number and kind of reviews that one enters into the system. You may start out with only a few "hunting for monster" kind of books, but if later you get reviews of many such books, then you need to add subcategories.

Overlap and coverage: An ideal system of categorization would have categories that cover most plot, setting, style, and character categories with as little overlap as possible. The more overlap you have, the greater the likelihood that a user will search for a book in category A while being unaware of similar books in category B.

Review weights: Here's where the complexity starts to increase even further. It's not enough for a reviewer to say "This book has some spaceships and romance". After all, a great many books and movies have some spaceships and romance. If one wants to match similar books and movies to each other, many books and movies will end up matching up to many other books and movies unless one resorts to weighting--that is, having the book categorized not only by plot, setting, character, and style, but by weighted attributes--have the more dominant plot, setting, character, and style attributes of a book or movie weighted more highly than the less dominant ones.

For example, Star Wars has spaceships and it has some romance. The movie Starman has a spaceship and it and has romance as well. Under a normal keyword system, these movies might match together if the keywords "spaceships" and "romance" had been applied to them. But we know that Star Wars is much more about spaceships than it is about romance. We also know that Starman is a lot more about romance than it is about spaceships. So what you need to do is have a system where you can say Star Wars is a lot about spaceships and a little about romance, so it can match to the same kind of movies, and Starman is a lot about romance and a little about spaceships, so it can do the same.

You also need the proper weight ratio. If the spread between the weights of the most dominant plot, setting, character, and style attributes is too high relative to lesser plot, setting, character, and style attributes, the lesser ones will be rendered useless in search results. If the spread is too small, lesser attributes will assume too much importance, relative to greater ones, resulting in less appropriate search results.

What is the proper weight ratios for each literary element? Well, that took some time to figure out!

Category weights: Another important element is category weights. Wait, we just talked about weighting categories above? Well yes, we talked about weighing them in the sense of how important a particular category was to a particular book or movie. But there is a second and equally important way that categories should be weighed, and that is in terms of relative importance to each other.

Some categories are more important and tell more what a book or movie is about. Here's an easy call: Which is a more important category, how much action a movie has, or the hair color of the main character? Obviously the former, so in matching movies (or books) the former should receive a lot more weight.

Here's a harder call: Which is more important, a story about time travel, or a story about romance? Through observation and trial and error, I hvae discovered that weighing Time Travel more heavily tends to produce better matches, but it wasn't an easy call.

What about stories about exploring a planet versus stories about clones? Or stories about robots versus stories about spaceship battles?
Each category must be given an independent weight relative to the others, or else the whole weighting system will be meaningless. Now do you see where some of the additional complexity comes into play?
This weighting must then be combined with the weighting that an individual reviewer gives for a particular book or movie, and then this combined weight can be compared with other books or movies to find similar matches.

If a search system doesn't make any sort of category differentiations like this, then books where the main character has the same hair color will match to each other just as often as books where the main character is a robot detective.

What is the proper relative weight for each of the kinds of hundreds of plot, setting, character, and style categories? That's the million dollar question, and I have worked the answer out over years of research.

Comments? Questions? Write me, Steve Gordon, here