Big data is currently a buzz word in the business world, as shown by the Google Trends search shown below:
Not surprisingly, big data has also been discussed in the world of patents. But is big data just another buzzword, or just this have real application to patent analysis?
To start answering this question, it might help to first define ‘big data’. According to Wikipedia, Big Data has the following definition:
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate.
This seems to be a reasonable definition to me – but – does this apply to patent data?
I would say yes. There are around 90 million patents published today. The data sets are complex, with patents falling into different jurisdictions, families, languages, kinds etc. Unlike other forms of big data, patent data is largely available from single sources (.eg. public and commercial databases) but it can still be complex.
Are traditional data processing applications inadequate?
For most people, a traditional patent search, for example in the freely available database Espacenet or many others, might look somehting along the lines
Show me all patents that includes the keyword ‘hybrid car OR ‘electric car’ – AND were filed between Jan 2001 and Jan 2010 – AND possibly include the following class code limitations…
We could get more specific – for example limit the results to certain jursdictions, patent owners, inventors, etc etc. It could allos include exlusions, such as ‘exclude all patents that refer to ‘toy’, for example
This would produce a list of results, sometimes very long, and not always sorted in terms of relevance to your search query. This is what we could call a ‘filtering search‘ – because it returns all patents that meet the filtering conditions.
There is nothing wrong per se with such searches, and we all do them all of the time. But there a couple of very real limitations:
- What if you get the filters wrong? So often relevant patents use keywords or have class codes that fall outside of what you expect. You can try to get around this by using as many keywords or class codes as possible, but this just exacerbates the second big problem
- Results sets can be large and unordered. It can many many hours to work your way through them.
So, in other words, while raditional data processing applications can work well, in some cases they are inadequate as well.
Luckily, big data tools for patent searching are available.
Big data tools for patent search
The most obvious one is Google Patent. I don’t claim to know exactly how Google patent works, but it appears to match the keywords you input into this, against a variety of fields (title, description etc). While Google Patent can return a large number of results, they do appear to be sorted into some sort of order of relevance, and generally you can find what you are looking for by say the first or second screen of results.
Google Patent also includes a ‘prior art finder’.
A number of patent examiners I have spoken too have happily admitted using Google Patent as well as more conventional patent search databases, and I myself are a frequent user.
Semantic search engines
There are a number of semantic patent search engines out there, including some that suggest that they can build up a patent search based on a block of text, for example a patent claim. I have not used them enough to comment, but I am sure that they could do a reasonable job in some circumstances (and possibly Google partly uses a semantic search as part of its black box search). Results are returned in an usefully ranked order.
There is a question about inconsistent use of technical terms in patent claims, for example ‘box’ vs ‘carton’. The vendors of these engines claim to have built thesaruses to manage this issues, and I am sure that they have. But I do have a concern.
Imagine you were examining the following hypothetical claim
“A box with four sides and two lids”.
Lets say there were 200 similar patents, where 100 used the keyword ‘box’ and 100 used the keyword ‘carton’. Thanks to the thesaurus, the semantic search engine might return all 200 patents. But would the patents that used the keyword ‘box’ end up ranked above the patent that used the keyword ‘carton’ – even though a carton is just another name for a carton?
You might ask the same about Google patent as well.
Patent citation based engines
Most patent search engines now list the forward and backwards for individual patents. But these are simple listings of citations.
Ambercite is the proud developer of the Cluster Searching and closely related AmberScope patent search engines. Both tools required a patent number(e) to start a search, but working from this can quickly identify similar patents based on citation analytics, both in the forward and backward directions, and over several generations in many cases.
The advantage of citation searching is that every citation is an opinion by a patent examiner or applicant that two patents are similar – regardless of the keywords or class codes used. Of course not all citations are perfectly relevant, but neither are all not search results returned by traditional searching, Google Patent or Semantic searching are perfectly relevant either. In addition, Amberice tools can predict which of each of the patent citation links are most likely to relevant, and use this is part of this process of returning similar patents – in a usefully ranked order.
Case studies of Cluster Searching are found here and here. Ambercite offers free trials to qualified applicants – please contact us for more details – or ask about our offer of free search reports for patent litigators.
What should patent search enginers should you use? Do big data engines have a place?
In my opinion, you should use as many as you can – they all have their strengthes and weaknesses, and the results of patent searches can have large commercial consequences. So yes, big data search engines do have a place.
In my daily life I use about half a dozen search engines, free and subscription based, on a regular basis, including traditional search engines – and also use Cluster Searching for its uncanny ability to find similar patents to patents I already know about. For the times in which I do not know the starting patents, obviously a more traditional search is required.
Searching in different engines can be interative process – a keyword search can led to patents that you can run a citation search with – which in turn can suggest new keywords for a traditional search. In this a second patent seach engine as a second (or third!) opinion to the first patent search database you used.
Whatever works basically – patent searching can be a very non-linear process and the best and most confident searches are happy to try a variety of approaches. A good colleagues of mine has used YouTube to successfully find prior art – again, whatever works.