# RDF SemSpect 19.1.2

RDF SemSpect is a scalable and data-driven knowledge graph exploration and no-code querying tool for RDF instance data.
It uses a client-server architecture with an HTML5/JavaScript UI and a Java REST Spring Boot backend.
RDF SemSpect applies RDFS `owl:equivalentProperty`, `owl:inverseOf` and `owl:SymmetricProperty` entailment.

**Please see the [SemSpect Web documentation](https://doc.semspect.de/docs/rdf-server/) for more details about RDF
SemSpect and configuration options.**

## System Requirements

- RDF SemSpect was tested with the following operating systems: OS X, Linux, Windows
- It requires Java 17 or later (parsing is approx. 50% slower with Java >=19)

## Supported RDF Formats

RDF SemSpect can load one or more files or all files from a given directory or archive (zip, gzip, bz2) in the various
formats (Turtle, N-Triple, N-Quad, OWL, Notation3, RDF/XML, HDT, JSON-LD, BinaryRDF, RDF/JSON, TriG). Note that in
case of the N-Quad and TriG formats, named graphs are ignored and everything is loaded into one default graph.

*The IRIs of the resources are not validated by default. The validation can however be activated in the
SemSpect [configuration](https://doc.semspect.de/docs/rdf-server/configuration/).*

### Start RDF SemSpect

For this quickstart, we will use the single-database "smart" mode of SemSpect (for the multi-database "server" mode,
see the [configuration page](https://doc.semspect.de/docs/rdf-server/configuration/)). Execute the `semspect-smart.sh` resp.`semspect-smart.bat` script with the `run`
command and supply your RDF input as a list of arguments.

Example:

```shell
% ./semspect-smart.sh run turtle-file.ttl ./data/n-triple-files.zip
```

In "smart" mode, in case the given input data was previously supplied to SemSpect, the
server loads the already existing index structures. If no index is available for the given data, SemSpect will generate
them first. Once the SemSpect server is initialized and ready, it displays the following terminal output:

```text
... INFO StartupInfoLogger SemSpect:  Started SemSpectServer in x.yz seconds ...
```

Now switch to your preferred web browser and open `http://localhost:8080/` or `http://127.0.0.1:8080/` to load the
SemSpect UI.

*If necessary, you can modify the port to your liking in `smart-config/semstore_config.yaml` or set the output paths
for the indexes via an environment variable: see [SemSpect configuration](https://doc.semspect.de/docs/rdf-server/configuration/) for more information.*

### Stop RDF SemSpect

To stop SemSpect, abort the SemSpect server in the terminal with `Ctrl-C`.

*Due to the multithreaded processing, depending on your OS and when you wish to stop SemSpect you might have to
use `Ctrl-Z` to suspend the application and then terminate it with the appropriate command.*

### Other Commands

To list all available commands of the terminal application enter:

```shell
% ./semspect-smart.sh help
```

The following commands might be useful after aborting a run:

- `% ./semspect-smart.sh run --clean <SOURCE-1> <SOURCE-2> ...`: Deletes the indices directory of the given RDF data sources
  if it already exists (before generating the indices anew).
  Example: `% ./semspect-smart.sh run --clean data-1.ttl data-2.xml`
- `% ./semspect-smart.sh purge`: Recursively deletes the directory that contains all indexed datasets.

### Memory Setting

The heap size required for generating or loading the indexes varies depending on your data. Based on our experience,
the one pass generation may require 1.5 to 4 times the size of the uncompressed input data in n-triples format,
while 0.5 to 2 times may be sufficient for the two-pass variant.

The maximal heap size can be set using the `-Xmx` JVM parameter (example: `-Xmx16G` for 16 GB). To set the JVM
parameters specifically for the SemSpect script, use the `SEMSPECT_JDK_OPTIONS` environment variable
(examples: `export SEMSPECT_JDK_OPTIONS=-Xmx16G` under Linux/MacOS or `set SEMSPECT_JDK_OPTIONS=-Xmx16G` under Windows).

If the maximal heap size is not set in `SEMSPECT_JDK_OPTIONS`, the standard java settings will be used (environment
variables `JDK_JAVA_OPTIONS` and `JAVA_TOOL_OPTIONS` or the JDK defaults settings if they are not set. The default
heap size is 25% of the available physical memory for OpenJDK 17).

We recommend changing the memory setting to the highest acceptable value: The more memory, the fewer intermediate
reorganization and compression steps will be necessary. Moreover, the memory released after the generation will be used
for caching, resulting in a smoother user experience.

## Supported RDF Formats

RDF SemSpect can load one or more files or all files from a given directory or archive (zip, gzip, bz2) in the
following formats:

- Turtle (`*.ttl`)
- Turtle with RDF-Star extension (`*.ttls`)
- OWL (`*.owl`)
- N-Triple (`*.nt`)
- N-Quad (`*.nq`)
- Notation3 (`*.n3`)
- RDF/XML (`*.rdf`)
- HDT (`*.hdt`)
- JSON-LD (`*.jsonld`)
- BinaryRDF (`*.brf`)
- RDF/JSON (`*.rj`)
- TriG (`*.trig`)*
- TriG with RDF-Star extension (`*.trigs`)*

IRIs of the parsed resources are not validated by default. The validation can however be activated in the SemSpect
[configuration](https://doc.semspect.de/docs/rdf-server/configuration/).

_*_ *Note that in case of the N-Quad and TriG formats, named graphs are ignored and everything is loaded into one
default graph.*

## Uninstallation

SemSpect stores data on disk as well as in the web browser you have used for the SemSpect UI
(see [Data Privacy](https://doc.semspect.de/docs/rdf-server/further-information/data-privacy/) for details). To remove all user
provided data you have to:

1. Start SemSpect and open the UI,
2. In the top menu select `SemSpect / Settings / Reset local data` (repeat this with all browsers in which
   you used SemSpect),
3. Stop SemSpect,
4. Remove all files from the installation directory (If you have set other data directories via the configuration, also
   delete these directories).

## Current Limitations

- RDF SemSpect supports in theory datasets with up to 2.14 billion triples (due to size limitation of Java collections;
  maximum triples tested: ~500M).
- Invisible `null` characters (`\u0000`) inside text entries are automatically removed to simplify processing in our compressed dictionary format.