How to Analyze a Patent
by John Morgan (with some help from Philip Greenspun)
Site Home : Software : One Article
Perhaps the most important thing for a software expert witness to
understand in a patent case is "what does the patent actually say?"
Oftentimes it will not be obvious to a person of ordinary skill in the
art what a patent covers simply by reading its claims. As such it is
important to review the patent's specification to understand
what infringes the patent and what does not.
With software patents in particular, sometimes the language of the
claims gives little hint as to what the patent might cover. Here's an excerpt from
U.S. Patent
7,464,087, "Method and system of unifying data":
31. A method for unifying a plurality of data sources storing data related to an industry, each of the plurality of data sources being a data source instance of a data source type, the method including:
a) storing information in a first plurality of nodes that defines a corresponding plurality of business context dimensions for the industry and interconnecting the first plurality of nodes in a manner that represents relationships between the corresponding business context dimensions;
...
What's a "dimension"? Is that the "dimension" of dimensional data warehousing? A
dimension of an array data structure in a standard programming
language? What's a "data source"? What's a "business context"?
Backin June 2011, we were asked to look at this patent by a law firm
representing a company that had been sued for infringement. We decided
that before offering even preliminary opinions or suggesting a method
for analyzing the accused systems we first needed to understand the
patent. The document below is what John Morgan wrote.
Summary of US Patent 7,464,087
The patent
describes a system that unifies data from a plurality of data sources
(each of which would typically be a database management system holding
a collection of tables) by abstracting away the details of accessing
each data source. The most important abstraction is of the data model
(specific layout of tables and columns) used by each source to
represent its information. In theory this allows data to be accessed
by reference to industry-specific logical concepts without knowledge
of the underlying data models within the database management systems
from which they are retrieved.
The system accomplishes this transformation by allowing administrators to define and store a set of mappings between the underlying data and the business concepts they represent. The system then references these mappings to translate user requests for data in the business context into concrete data access requests.
Definitions in the Specification
The authors of the patent describe the invention using their own
lexicography, which must be understood before the claims can be
understood. The following definitions provide context for
understanding the analysis of claims and the example that follows:
- dimension: a specific logical concept within an industry. In the context of the pharmaceutical industry used as an example throughout the patent a dimension might represent a sponsor, study, site, or patient. (Note that this has nothing to do with the conventional use of the term "dimension" in "dimensional data warehousing" or as part of a star schema; this is much closer to the concept of an "entity" in a conventional entity-relationship model (see http://en.wikipedia.org/wiki/Entity-relationship_model).)
- dimension instance: a record of the type represented by the instance. Again drawing on the pharmaceutical industry example the patient dimension may include a dimension instance such as "Joe Smith." (in standard DBMS parlance, this would simply be a row within a view)
- industry business context: a set of dimensions that define the data pertinent to a specific industry.
- data source: a source of structured information. An example of a data source may be a Relational Database Management System (RDBMS).
The components that make up the implementation of the system itself are described as:
- UniDim: a component that represents a dimension (a logical concept for a given industry). Within an RDBMS a UniDim is implemented as a table.
- DataSourceDim: a component that represents a data source specific dimension. Each DataSourceDim is related to at least one UniDim. Returning to the pharmaceutical example there may be a UniDim representing the concept of a patient and one of more DataSourceDims each corresponding to specific representations of patient data in each data source. Just like a UniDim, within an RDBMS a DataSourceDim is represented as a table.
- node: an object such as a UniDim or DataSourceDim. For RDBMS based implementations a node corresponds to a table. (see separate section below)
- UniDimNet: a collection of related UniDims and DataSourceDims. This is analogous to an industry business context. Within an RDBMS the UniDimNet is the complete set of UniDim and DataSourceDim tables and the relationships defined between them.
- UniView: a template created to query a data source. It is described as containing the specific question for a specific dimension designed for a specific data source. In the case of an RDBMS the UniView can be represented by a view defined using Structured Query Language (SQL).
- UniViewer: a user interface that allows a user to query the data sources by identifying an industry business context dimension, a dimension instance, and at least one UniView (although multiple UniViews can be combined, cached, and saved to facilitate complex queries). The UniViewer is implemented as a separate application such as desktop software or a web page.
- UniBase: the database used to store the UniDimNet, UniView definitions, and cached UniView results.
- UniServer: a central server that coordinates the system and facilitates use of the system through an interface (the UniViewer).
Nodes and network
The patent does not disclose using a 1960s-style network database (see
http://en.wikipedia.org/wiki/Network_model_(database)). The
idea of a network and nodes is not native to a standard
RDBMS. However, it is possible to represent a network structure in an
RDBMS. A common technique for doing this is to create a table that
represents the network, with one row for each node (assume the primary
key is NODE_ID). Connections between nodes can then be implemented by
adding entries to a mapping table with at least two columns, one to
store the NODE_ID of the from node and the other to store the NODE_ID
of the to node (in this way an edge in the network is represented and
it can be directional or not, depending on the application).
The patent does not follow this model and instead suggests that each
node be represented by a table. As described in 10:28 the "UniDimNet
is a series of interrelated tables, with each node in the UniDimNet
being represented by a table. As such, each UniDim and each
DataSourceDim in the UniDimNet are represented by a table. Each
dimension instance in a data source . is represented by a row in at
least one UniDim of the UniDimNet and at least one DataSourceDim of
the UniDimNet." An example of this network structure is shown at the
end of this document. For an illustration of the tables and their
links see Fig. 8 in the patent itself.
Apparatus Claim
1. A system for unifying data relating to an industry having a plurality of industry business context dimensions which define logical groupings of data related to the industry, the system comprising:
one or more computer systems, comprising:
a plurality of data sources, at least one-two data sources having a physical or logical structure differing from at least one other data source, each data source having data which is capable of a logical contextual grouping into at least one data source specific dimension which contains data related to at least one industry business context dimension, and each data source having a data access mechanism for facilitating querying thereof, wherein each dimension has at least one dimension instance and each of the at least two data sources have data relating to a dimension instance;
a database having a first and a second plurality of nodes, each of the first plurality of nodes representing an industry business context dimension, each of the second plurality of nodes representing a data source specific dimension of at least one of the data sources, each of the first plurality of nodes related to at least one other of the first plurality of nodes, and each of the second plurality of nodes related to at least one of the first plurality of nodes, wherein the database is stored in at least one of the data sources;
a plurality of data source query function calls, each query function call querying a single data source regarding a single data source specific dimension, and each query function call using the data access mechanism of the single data source to facilitate access to the single data source; and
a complex query comprising a plurality of data source query function calls, the complex query querying the at least two data sources for data relating to the dimension instance, the complex query calling the plurality of data source query function calls to perform the querying of the at least two data sources for the data relating to the dimension instance, and wherein the data relating to the dimension instance is retrieved from each of the at least two data sources.
In claim 1 the system is described as comprising dissimilar data sources containing data that can be classified into DataSourceDims on a per data source basis. The system should be able to map these related DataSourceDims to a UniDim. There should be records within each data source containing information about specific entities that can be unified to form a common UniDim instance.
The system also includes a database representing two groups of nodes known as the UniDimNet. The first group is a collection of UniDims pertinent to the industry. The second group is a collection of DataSourceDims which correspond to those UniDims.
Additionally the system includes data source query functions (UniViews) that allow data to be retrieved from a given data source based on a DataSourceDim. These UniViews are specific to the data sources they query.
As an extension of these low-level queries the system also supports complex queries that are composed of multiple data source query functions (complex UniViews). A complex query calls multiple data-source query functions that may be from different data sources but are related to the same dimension instance and returns the results.
Method Claim
31. A method for unifying a plurality of data sources storing data related to an industry, each of the plurality of data sources being a data source instance of a data source type, the method including:
a) storing information in a first plurality of nodes that defines a corresponding plurality of business context dimensions for the industry and interconnecting the first plurality of nodes in a manner that represents relationships between the corresponding business context dimensions;
b) storing information in a second plurality of nodes that indicates data stored within the plurality of data sources corresponds to at least a portion of the plurality of business context dimensions for at least one of i) each data source type represented within the plurality of data sources or ii) each data source instance represented within the plurality of data sources, and mapping the second plurality of nodes to the first plurality of nodes based on corresponding business context dimensions;
c) storing information in each node of the second plurality of nodes that defines at least one business context dimension instance from the plurality of data sources, each business context dimension instance relating to an instance of stored data within the corresponding data source instance associated with the corresponding business context dimension;
d) initiating a request for desired information from the plurality of data sources based at least in part on selection of one or more of the plurality of business context dimensions defined by the first plurality of nodes; and
e) identifying data source instances and business context dimension instances associated with the at least one selected business context dimension based at least in part on information stored in the first and second pluralities of nodes.
Claim 31 can be restated in the language of the description as:
- Creating a set of UniDims.
- Creating a set of DataSourceDims and mapping them to the UniDims
- Using the DataSourceDim to UniDim mapping to map DataSourceDim instances to UniDim instances
- Allowing for queries of the data sources based, in part, on the selection of a UniDim.
- Identifying data sources and dimension instances based on the selection of a UniDim and the mapping defined in part c.
The following example shows how this method might be implemented in an RDBMS:
Hospital Patients Example
The following is an example of how a system embodying these claims might look if tasked with unifying data from RDBMS instances at two different hospitals. For this example the dimension of interest is the patient. The first data source is the first hospital's RDBMS, which contains a patient table:
CREATE TABLE hospital_one.patient(
ssn char(9) not null primary key,
name varchar(50),
address varchar(80),
birthdate date,
gender char(1),
weight int,
last_visit date,
notes varchar(100)
)
The second data source is the second hospital's RDBMS, which contains a patient_info table:
CREATE TABLE hospital_two.patient_info(
patient_id int not null primary key,
lastname varchar(35),
firstname varchar(35),
gender char(1),
address varchar(50),
city varchar(50),
state char(2),
zipcode char(9),
primary_doctor varchar(50),
most_recent_visit date
)
Now that the data sources have been defined, it's time for the system administrator to create the UniDim and DataSourceDim definitions that will live within the system's local database (as opposed to within one of the hospital databases). The patient is the business concept we care about so the UniDim will keep track of each patient. In this implementation patients are tracked by name at the UniDim level:
CREATE TABLE UniDimNet.UniDim_Patient(
global_id int not null primary key,
patient_name varchar(40),
data_source_dim varchar(50) /*contains name of corresponding DataSourceDim table*/
)
[Note that this is different from the conventional way of doing things in an RDBMS in which a single view would be created with all of the rows from all of the data sources. The patent actually discloses something that is more cumbersome.]
Now that the UniDim structure has been defined we'll create a DataSourceDim for each data source. Since a person's name is not a unique identifier we'll have to find something in each hospital's data model that we can use to be sure we're identifying the right patient. Hospital One's data model includes a social security number. We know this to be a unique identifier so for hospital one we will keep track of patients by SSN:
CREATE TABLE UniDimNet.DataSourceDim_HospitalOnePatient(
global_id int not null primary key,
data_source int,
patient_ssn char(9) unique
)
Hospital Two does not store patient SSNs in their database (perhaps as a measure to protect patient privacy) and so its DataSourceDim must be defined differently. While SSN isn't available, the database does store a unique patient_id which we can use to identify a specific patient:
CREATE TABLE UniDimNet.DataSourceDim_HospitalTwoPatient(
global_id int not null primary key,
data_source int,
patient_id int unique
)
Finally we create a table to help us reference each data source:
CREATE TABLE DataSources(
id int not null primary key,
name varchar(50) /* name of the source, used to determine where to get the data */
)
Now that our example data model has been defined these tables then can be populated with instance information:
INSERT INTO DataSources (id, name) VALUES (1, "hospital_one")
INSERT INTO DataSources (id, name) VALUES (2, "hospital_two")
INSERT INTO UniDimNet.DataSourceDim_HospitalOnePatient (global_id, data_source, patient_ssn) VALUES (1, 1, "123456789") /* key (patient SSN): 123-45-6789 */
INSERT INTO UniDimNet.DataSourceDim_HospitalTwoPatient (global_id, data_source, patient_id) VALUES (2, 2, 98765) /* key (patient id): 98765 */
INSERT INTO UniDimNet.UniDim_Patient (global_id, patient_name, data_source_dim) VALUES (1, "Joe Smith", "DataSourceDim_HospitalOnePatient")
INSERT INTO UniDimNet.UniDim_Patient (global_id, patient_name, data_source_dim) VALUES (2, "Jane Smith", "DataSourceDim_HospitalTwoPatient")
To find out the last time a patient visited Hospital One a simple UniView can be defined that grabs this information from the hospital's RDBMS:
CREATE VIEW last_visit AS SELECT ssn AS instance_key, last_visit AS result FROM hospital_one.patient
sample query:
SELECT result FROM last_visit WHERE instance_key="123456789"
sample data returned from hospital one's RDBMS:
Another simple UniView pulls information on a patient's last visit to Hospital Two from Hospital Two's RDBMS (note that the UniViews are defined to abstract the differences in data model):
CREATE VIEW last_visit AS SELECT patient_id AS instance_key, most_recent_visit AS result FROM hospital_two.patient_info
sample query:
SELECT result FROM visit_query WHERE instance_key=98765
sample data returned:
Now that we have a UniView corresponding to each data source we can use a UniViewer to query the system. The following is an example interaction flow:
- User begins UniViewer session by selecting a UniDim of interest. In this case that UniDim would be "Patient"
- The user is now presented with a list of patient names corresponding to each dimension instance. The output would be: "Joe Smith," "Jane Smith"
- The user selects "Jane Smith" and the last_visit UniView and submits the query.
- The user waits as the system queries the UniDimNet to determine that "Jane Smith" is a patient of Hospital Two and can be identified by the patient_id 98765. The system then executes the Hospital Two version of the last_visit UniView to retrieve the date of her last hospital visit.
- The user observes the output "2010-03-19" on the screen.
johnpatrickmorgan@gmail.com