maplib
1# r''' 2# # Overview 3# 4# ''' 5 6__all__ = [ 7 "Mapping", 8 "a", 9 "Triple", 10 "SolutionMappings", 11 "IndexingOptions", 12 "ValidationReport", 13 "Instance", 14 "Template", 15 "Argument", 16 "Parameter", 17 "Variable", 18 "RDFType", 19 "XSD", 20 "IRI", 21 "Literal", 22 "Prefix", 23 "BlankNode", 24 "explore", 25 "add_triples", 26 "MaplibException" 27] 28 29import pathlib 30from .maplib import * 31from .add_triples import add_triples 32 33if (pathlib.Path(__file__).parent.resolve() / "graph_explorer").exists(): 34 from .graph_explorer import explore 35else: 36 async def explore( 37 m: "Mapping", 38 host: str = "localhost", 39 port: int = 8000, 40 bind: str = "localhost", 41 popup=True, 42 fts=True, 43 ): 44 """Starts a graph explorer session. 45 To run from Jupyter Notebook use: 46 >>> from maplib import explore 47 >>> 48 >>> await explore(m) 49 50 This will block further execution of the notebook until you stop the cell. 51 52 :param m: The Mapping to explore 53 :param host: The hostname that we will point the browser to. 54 :param port: The port where the graph explorer webserver listens on. 55 :param bind: Bind to the following host / ip. 56 :param popup: Pop up the browser window. 57 :param fts: Enable full text search indexing 58 """ 59 print("Contact Data Treehouse to try!")
A mapping session allowing:
- Iterative mapping using OTTR templates
- Interactive SPARQL querying and enrichment
- SHACL validation
Usage:
>>> from maplib import Mapping
... doc = '''
... :prefix ex:<http://example.net/ns#>.
... ex:ExampleTemplate [?MyValue] :: {
... ottr:Triple(ex:myObject, ex:hasValue, ?MyValue)
... } .'''
... m = Mapping(doc)
Parameters
- documents: a stOTTR document or a list of these
- indexing_options: options for indexing
Add a template to the mapping. Overwrites any existing template with the same IRI.
Parameters
- template: The template to add, as a stOTTR string or as a programmatically constructed Template.
Returns
A sprout is a simplified way of dealing with multiple graphs.
See also Mapping.insert_sprout
and Mapping.detach_sprout
Returns
Expand a template using a DataFrame Usage:
>>> m.expand("ex:ExampleTemplate", df)
If the template has no arguments, the df argument is not necessary.
Parameters
- template: Template, IRI, IRI string or prefixed template name.
- df: DataFrame where the columns have the same names as the template arguments
- graph: The IRI of the graph to add triples to.
- types: The types of the columns.
- validate_iris: Validate any IRI-columns.
- delay_index: Delay index construction - reduces write amplification when doing many expansions
Expand a template using a DataFrame with columns subject, object and verb The verb column can also be supplied as a string if it is the same for all rows. Usage:
>>> m.expand_triples(df)
If the template has no arguments, the df argument is not necessary.
Parameters
- df: DataFrame where the columns are named subject and object. May also contain a verb-column.
- verb: The uri of the verb.
- graph: The IRI of the graph to add triples to.
- types: The types of the columns.
- validate_iris: Validate any IRI-columns.
- delay_index: Delay index construction - reduces write amplification when doing many expansions
Create a default template and expand it based on a dataframe. Usage:
>>> template_string = m.expand_default(df, "myKeyCol")
... print(template_string)
Parameters
- df: DataFrame where the columns have the same names as the template arguments
- primary_key_column: This column will be the subject of all triples in the generated template.
- dry_run: Do not expand the template, only return the string.
- graph: The IRI of the graph to add triples to.
- types: The types of the columns.
- validate_iris: Validate any IRI-columns.
- delay_index: Delay index construction - reduces write amplification when doing many expansions
Returns
The generated template
Query the contained knowledge graph using SPARQL Currently, SELECT, CONSTRUCT and INSERT are supported. Usage:
>>> df = mapping.query('''
... PREFIX ex:<http://example.net/ns#>
... SELECT ?obj1 ?obj2 WHERE {
... ?obj1 ex:hasObj ?obj2
... }''')
... print(df)
Parameters
- query: The SPARQL query string
- parameters: PVALUES Parameters, a DataFrame containing the value bindings in the custom PVALUES construction.
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Datatypes are not returned by default, set to true to return a dict with the solution mappings and the datatypes.
- graph: The IRI of the graph to query.
- streaming: Use Polars streaming
- return_json: Return JSON string.
- include_transient: Include transient triples when querying.
Returns
DataFrame (Select), list of DataFrames (Construct) containing results, or None for Insert-queries
Parameters
- options: Indexing options
- all: Apply to all existing and new graphs
- graph: The graph where indexes should be added
Returns
Validate the contained knowledge graph using SHACL Assumes that the contained knowledge graph also contains SHACL Shapes.
Parameters
- shape_graph: The IRI of the Shape Graph.
- include_details: Include details of SHACL evaluation alongside the report. Currently uses a lot of memory.
- include_conforms: Include those results that conformed. Also applies to details.
- include_shape_graph: Include the shape graph in the report, useful when creating the graph from the report.
- include_datatypes: Return the datatypes of the validation report (and details).
- streaming: Use Polars streaming
- max_shape_results: Maximum number of results per shape. Reduces the size of the result set.
- result_storage: Where to store validation results. Can reduce memory use for large result sets.
- only_shapes: Validate only these shapes, None means all shapes are validated (must be IRI, cannot be used with deactivate_shapes).
- deactivate_shapes: Disable validation of these shapes (must be IRI, cannot be used with deactivate_shapes).
- dry_run: Only find targets of shapes, but do not validate them.
Returns
Validation report containing a report (report.df) and whether the graph conforms (report.conforms)
Insert the results of a Construct query in the graph. Useful for being able to use the same query for inspecting what will be inserted and actually inserting. Usage:
>>> m = Mapping(doc)
... # Omitted
... hpizzas = '''
... PREFIX pizza:<https://github.com/magbak/maplib/pizza#>
... PREFIX ing:<https://github.com/magbak/maplib/pizza/ingredients#>
... CONSTRUCT { ?p a pizza:HeterodoxPizza }
... WHERE {
... ?p a pizza:Pizza .
... ?p pizza:hasIngredient ing:Pineapple .
... }'''
... m.insert(hpizzas)
Parameters
- query: The SPARQL Insert query string
- parameters: PVALUES Parameters, a DataFrame containing the value bindings in the custom PVALUES construction.
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Datatypes are not returned by default, set to true to return a dict with the solution mappings and the datatypes.
- transient: Should the inserted triples be transient?
- source_graph: The IRI of the source graph to execute the construct query.
- target_graph: The IRI of the target graph to insert into.
- streaming: Use Polars streaming
- delay_index: Delay indexing, use when making multiple inserts of the same predicate.
- include_transient: Include transient triples when querying (but see "transient" above).
Returns
None
Insert the results of a Construct query in a sprouted graph, which is created if no sprout is active.
Sprouts are simplified way of dealing with multiple graphs.
Useful for being able to use the same query for inspecting what will be inserted and actually inserting.
See also Mapping.detach_sprout
Usage:
>>> m = Mapping(doc)
... m.create_sprout()
... # Omitted
... hpizzas = '''
... PREFIX pizza:<https://github.com/magbak/maplib/pizza#>
... PREFIX ing:<https://github.com/magbak/maplib/pizza/ingredients#>
... CONSTRUCT { ?p a pizza:HeterodoxPizza }
... WHERE {
... ?p a pizza:Pizza .
... ?p pizza:hasIngredient ing:Pineapple .
... }'''
... m.insert_sprout(hpizzas)
Parameters
- query: The SPARQL Insert query string
- parameters: PVALUES Parameters, a DataFrame containing the value bindings in the custom PVALUES construction.
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Datatypes are not returned by default, set to true to return a dict with the solution mappings and the datatypes.
- transient: Should the inserted triples be included in exports?
- source_graph: The IRI of the source graph to execute the construct query.
- target_graph: The IRI of the target graph to insert into.
- streaming: Use Polars streaming
- delay_index: Delay indexing, use when making multiple inserts of the same predicate to improve performance.
- include_transient: Include transient triples when querying (see also "transient" above).
Returns
None
Reads triples from a file path. You can specify the format, or it will be derived using file extension, e.g. filename.ttl or filename.nt. Specify transient if you only want the triples to be available for further querying and validation, but not persisted using write-methods.
Usage:
>>> m.read_triples("my_triples.ttl")
Parameters
- file_path: The path of the file containing triples
- format: One of "ntriples", "turtle", "rdf/xml", otherwise it is inferred from the file extension.
- base_iri: Base iri
- transient: Should these triples be included when writing the graph to the file system?
- parallel: Parse triples in parallel, currently only NTRiples. Assumes all prefixes are in the beginning of the document.
- checked: Check IRIs etc.
- graph: The IRI of the graph to read the triples into, if None, it will be the default graph.
- replace_graph: Replace the graph with these triples? Will replace the default graph if no graph is specified.
Reads triples from a string. Specify transient if you only want the triples to be available for further querying and validation, but not persisted using write-methods.
Usage:
>>> m.read_triples(my_ntriples_string, format="ntriples")
Parameters
- s: String containing serialized triples.
- format: One of "ntriples", "turtle", "rdf/xml".
- base_iri: Base iri
- transient: Should these triples be included when writing the graph to the file system?
- parallel: Parse triples in parallel, currently only NTRiples. Assumes all prefixes are in the beginning of the document.
- checked: Check IRIs etc.
- graph: The IRI of the graph to read the triples into.
- replace_graph: Replace the graph with these triples? Will replace the default graph if no graph is specified.
DEPRECATED: use write_triples with format="ntriples" Write the non-transient triples to the file path specified in the NTriples format.
Usage:
>>> m.write_ntriples("my_triples.nt")
Parameters
- file_path: The path of the file containing triples
- graph: The IRI of the graph to write.
Write the non-transient triples to the file path specified in the NTriples format.
Usage:
>>> m.write_triples("my_triples.nt", format="ntriples")
Parameters
- file_path: The path of the file containing triples
- format: One of "ntriples", "turtle", "rdf/xml".
- graph: The IRI of the graph to write.
Write the legacy CIM XML format.
>>> PROFILE_GRAPH = "urn:graph:profiles"
>>> m = Mapping()
>>> m.read_triples(model_path, base_iri=publicID, format="rdf/xml")
>>> m.read_triples("61970-600-2_Equipment-AP-Voc-RDFS2020_v3-0-0.rdf", graph=PROFILE_GRAPH, format="rdf/xml")
>>> m.read_triples("61970-600-2_Operation-AP-Voc-RDFS2020_v3-0-0.rdf", graph=PROFILE_GRAPH, format="rdf/xml")
>>> m.write_cim_xml(
>>> "model.xml",
>>> profile_graph=PROFILE_GRAPH,
>>> description = "MyModel",
>>> created = "2023-09-14T20:27:41",
>>> scenario_time = "2023-09-14T02:44:43",
>>> modeling_authority_set="www.westernpower.co.uk",
>>> version="22",
>>> )
Parameters
- file_path: The path of the file containing triples
- profile_graph: The IRI of the graph containing the ontology of the CIM profile to write.
- model_iri: model_iri a md: FullModel. Is generated if not provided.
- version: model_iri md: Model.version version .
- description: model_iri md: Model.description description .
- created: model_iri md: Model.created created .
- scenario_time: model_iri md: Model.scenarioTime scenario_time .
- modeling_authority_set: model_iri md: Model.modelingAuthoritySet modeling_authority_set .
- prefixes: Prefixes to be used in XML export.
- graph: The graph to write, defaults to the default graph.
DEPRECATED: use write_triples_string with format="ntriples" Write the non-transient triples to a string in memory.
Usage:
>>> s = m.write_ntriples_string()
Parameters
- graph: The IRI of the graph to write. :return Triples in mapping in the NTriples format (potentially a large string)
DEPRECATED: use write_triples_string with format="ntriples" Write the non-transient triples to a string in memory.
Usage:
>>> s = m.write_ntriples_string(format="turtle")
Parameters
- format: One of "ntriples", "turtle", "rdf/xml".
- graph: The IRI of the graph to write. :return Triples in mapping in the NTriples format (potentially a large string)
Write non-transient triples using the internal native Parquet format.
Usage:
>>> m.write_native_parquet("output_folder")
Parameters
- folder_path: The path of the folder to write triples in the native format.
- graph: The IRI of the graph to write.
Parameters
- graph: The graph to get the predicate iris from.
- include_transient: Should we include predicates only between transient triples?
Returns
The IRIs of the predicates currently in the given graph.
Parameters
- iri: The predicate IRI
- graph: The graph to get the predicate from.
- include_transient: Should we include transient triples?
Returns
A list of the underlying tables that store a given predicate.
Add a Datalog ruleset to the mapping, concatenating it with any existing ruleset.
Parameters
- ruleset: The ruleset to add
Returns
Run the inference rules
Parameters
- insert: Will the resulting triples be inserted into the triplestore, or returned?
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Datatypes are not returned by default, set to true to return a dict with the solution mappings and the datatypes.
Returns
The inferred N-Tuples if the triples are not inserted.
Returns
IRI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")
An OTTR Triple Pattern used for creating templates. This is the basis pattern which all template instances are rewritten into. Equivalent to:
>>> ottr = Prefix("http://ns.ottr.xyz/0.4/")
... Instance(ottr.suf("Triple"), subject, predicate, object, list_expander)
Parameters
- subject:
- predicate:
- object:
- list_expander:
Returns
Detailed information about the solution mappings and the types of the variables.
Options for indexing
Defaults to indexing on subjects and objects for select types (e.g. rdf:type and rdfs:label)
Parameters
- object_sort_all: Enable object-indexing for all suitable predicates (doubles memory requirement).
- object_sort_some: Enable object-indexing for a selected list of predicates.
- fts_path: Enable full text search, stored at the path
SHACL Validation report. Only constructed by maplib.
Return the results of the validation report, if they exist.
Parameters
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Return datatypes of the results DataFrame (returns SolutionMappings instead of DataFrame).
- streaming: Use the Polars streaming functionality.
Returns
The SHACL validation report, as a DataFrame
Returns the details of the validation report. Only available if validation was called with include_details=True.
Parameters
- native_dataframe: Return columns with maplib-native formatting. Useful for round-trips.
- include_datatypes: Return datatypes of the results DataFrame (returns SolutionMappings instead of DataFrame).
- streaming: Use the Polars streaming functionality.
Returns
Details of the SHACL validation report, as a DataFrame
Creates a new mapping object where the base graph is the validation report with results. Includes the details of the validation report in the new graph if they exist.
Parameters
- indexing: Should the constructed graph be indexed? If not specified it is inherited from the mapping where validate was called.
Returns
A template instance.
Parameters
- iri: The IRI of the template to be instantiated.
- arguments: The arguments for template instantiation.
- list_expander: (How) should we do list expansion?
Parameters
- arguments: The arguments to the template.
- list_expander: (How) should we list-expand?
Returns
An OTTR Template. Note that accessing parameters- or instances-fields returns copies. To change these fields, you must assign new lists of parameters or instances.
Create a new parameter for a Template.
Parameters
- variable: The variable.
- optional: Can the variable be unbound?
- allow_blank: Can the variable be bound to a blank node?
- rdf_type: The type of the variable. Can be nested.
- default_value: Default value when no value provided.
A variable in a template.
The type of a column containing a RDF variable. For instance, xsd:string is RDFType.Literal("http://www.w3.org/2001/XMLSchema#string")
The xsd namespace, for convenience.
An RDF literal.
Create a new RDF Literal
Parameters
- value: The lexical representation of the value.
- data_type: The data type of the value (an IRI).
- language: The language tag of the value.
A prefix that can be used to ergonomically build iris.
Create a new prefix.
Parameters
- prefix: The name of the prefix
- iri: The prefix IRI.
A Blank Node.
37 async def explore( 38 m: "Mapping", 39 host: str = "localhost", 40 port: int = 8000, 41 bind: str = "localhost", 42 popup=True, 43 fts=True, 44 ): 45 """Starts a graph explorer session. 46 To run from Jupyter Notebook use: 47 >>> from maplib import explore 48 >>> 49 >>> await explore(m) 50 51 This will block further execution of the notebook until you stop the cell. 52 53 :param m: The Mapping to explore 54 :param host: The hostname that we will point the browser to. 55 :param port: The port where the graph explorer webserver listens on. 56 :param bind: Bind to the following host / ip. 57 :param popup: Pop up the browser window. 58 :param fts: Enable full text search indexing 59 """ 60 print("Contact Data Treehouse to try!")
Starts a graph explorer session. To run from Jupyter Notebook use:
>>> from maplib import explore
>>>
>>> await explore(m)
This will block further execution of the notebook until you stop the cell.
Parameters
- m: The Mapping to explore
- host: The hostname that we will point the browser to.
- port: The port where the graph explorer webserver listens on.
- bind: Bind to the following host / ip.
- popup: Pop up the browser window.
- fts: Enable full text search indexing
5def add_triples( 6 source: Mapping, target: Mapping, source_graph: str = None, target_graph: str = None 7): 8 """(Zero) copy the triples from one Mapping into another. 9 10 :param source: The source mapping 11 :param target: The target mapping 12 :param source_graph: The named graph in the source mapping to copy from. None means default graph. 13 :param target_graph: The named graph in the target mapping to copy into. None means default graph. 14 """ 15 for p in source.get_predicate_iris(source_graph): 16 subject = Variable("subject") 17 object = Variable("object") 18 template = Template( 19 iri=IRI("urn:maplib:tmp"), 20 parameters=[subject, object], 21 instances=[Triple(subject, p, object)], 22 ) 23 sms = source.get_predicate(p, source_graph) 24 for sm in sms: 25 target.expand( 26 template, 27 sm.mappings, 28 types=sm.rdf_types, 29 graph=target_graph, 30 )
(Zero) copy the triples from one Mapping into another.
Parameters
- source: The source mapping
- target: The target mapping
- source_graph: The named graph in the source mapping to copy from. None means default graph.
- target_graph: The named graph in the target mapping to copy into. None means default graph.
Common base class for all non-exit exceptions.