Clusters (Advanced Clustering)
PatSeer Pro X allows you to conduct full text clustering of the records in your project. Text clustering helps identify important topics or concepts (clusters) from a set of documents.
Clustering of key patent data documents (Title & Abstract, Full Claims, Independent Claims or you can combination of these fields with Description) can be used in various tools like co-occurrence analyzer, VizMAP or any other analysis tool and can help bring out the otherwise hidden insights within patents. Analyzing relationships between generated clusters or analyzing relationships between patent classifications and clusters are popular mechanisms used by researchers especially those in a competitive intelligence role.
The option to generate clusters is present besides the quick stats tab. To generate the clusters click on the generate clusters button.

You can create clusters using Title & Abstract, Full Claims, Independent Claims and Description. When you click on Generate clusters, the application runs through all the records in the project and based on the fields that you have selected to analyze all the text in those fields and then organize the patents into 3 different types of clusters: keywords, topics and themes

Click on View to see the types of clusters generated. These are present as separate tabs. You can also click on any group to see the records in right pane
• Keywords are raw set of technology terms that have occurred in the data set. You can click on any of the keywords to view the records in the right window pane

• Topics is a higher level of clustering wherein keywords are ranked on how many times they have occurred and then are elevated to the level of concept called topics. There is a ranking algorithm that decides which of those keywords is the topic and then for each topic a bunch of related subtopics is also extracted. Topics are hierarchical i.e., they have a main topic and subtopics.

• Themes are advanced version of topics wherein different topics are matched with each other and those topics which frequently occur with each other are grouped into a common theme

Working with generated clusters
You can view the generated clusters as filters in your results page. Within Cluster Fields, you can see list of keywords, topics and themes and use those as filters.

You can also analyze patent fields along with clusters using co-occurrence analyzer. For e.g., you can generate co-occurrence matrix for Tech Domain vs any of the cluster fields.

You can further use generated clusters (Topics/Themes) in Landscape mode of VizMAP.

Adding custom stop words to advanced clustering
You will be able to submit a list of stop words which you want the system to ignore. Stop words/ noise words are commonly occurring words in each language and are ignored by the analyzer. Please note that stop words are used to refine the keyword set by removing commonly occurring words. It is beneficial to apply customize stop word list. In most cases, your list of custom stop words comes from looking at the generated set of keywords. Just like how stop words are common words in a particular language there may be common words that occur in patents in your specific domain.
The option for same is present within Advanced Clustering -> Generate -> Cluster Settings
You will see options to create new stop word list under Cluster Settings.

Stop word Formats
The following table gives examples of stop-words you can use and how they match the cluster names:
Stop-word | Matching Clusters | Non-matching Clusters |
---|---|---|
more information | more information More information MORE INFORMATION | more informations more information about some more information |
more information* | more information More information about More information about a | more informations more informations about some more information |
* information * | information more information information about a lot more information on | informations more informations about some more informations |
Programm * | Programmer Programming Programming Language | |
*cycl* | Monocyclic acid Cycle | |
* | Illegal pattern, there must be at least one non-wildcard word. |
Viewing Generated Clusters in Form of Visualizations
Once the clusters are generated, by default you will see the Keywords, Themes represented as Treemap and Topics as circles. This can be done by clicking on View.

You can further switch between the different charting options.

Keywords and Themes can be visualized as below
