Overview of SIETS:

Search Engine and Crawler Software Platform

SIETS software in corporate networks

SIETS is a software technology platform for building high performance information storage and retrieval applications such as search engines, data warehouses, CRM applications, call-center solutions, business directories, location search applications, knowledge bases, document archives, digital content management systems for libraries etc.

SIETS software platform combines several fundamental data management software technologies, all integrated into one software code: Siets Server.

SIETS Server provides the following key functionalities in one software stack:

  • client-server XML-data store
  • full text search engine
  • distributed data clustering

Two most typical network configurations where Siets Server software can operate as the enterprise search engine are shown below:

Siets Server Cluster in Corporate Network

Figure 1. Siets Server can operate in the corporate network as an enterprise search engine running in cluster configuration.

Siets Search Appliance in Network

Figure 2. Siets Server can be installed in the corporate network to run as a dedicated enterprise search appliance or a cluster using fleet of many servers.

SIETS is a complete platform software: with developer's documentation, sample source code for the most common programming languages, Web-based management system.

The entire SIETS platform consists of the following core software components:

  • Siets Server implemented as a native XML database with built in search engine
  • Siets API as an open application programming interface using XML-messaging over http/https protocol
  • Siets Enterprise Manager as web administration tool for Siets Server
  • Siets Crawler as an integrated file system and web domain crawling (spidering) software for easy data import from web-sites, ftp and file servers

Subjects for full text search can be any unstructured data, for example, text collections, separate phrases or words in text documents, Web pages, XML data, office document collections etc.

Customer applications are servicing all online end-user requests using web and email servers in their front-end tier.

Customer application servers and file servers normally operate in the middleware tier.

Siets Server is typically used in the back-end tier within the customer multi-tier IT infrastructure facility, similarly to SQL servers.

Due to security policy Siets Server is never directly connected to the public Internet.

Customer applications use Siets API to access Siets Server functionality over secure internal network.

More about Siets API here: Documentation

Built-in linguistic full text search engine

Modern data retrieval applications on the Internet need a pretty fast and good search engine facility to compete on the Internet.

Most Internet users will type in their online queries in plain text, mostly in their own language they know, to search for any information online.

Growing requirement for natural language processing in online information services on the Web is not the only problem that application developers should solve.

Nowadays, unstructured data volumes in companies are also increasing very rapidly.

The only way how to effectively retrieve such text-rich data from collections and, therefore, make the data usable to users, using free text search as their only queries, is a full text search (FTS) engine platform software.

Full text search is the main system development methodology implemented in the Siets Server software for information indexing and searching.

Despite relational databases that are in mainstream commercial use since 1980s, there are far less number of tools for FTS data retrieval from large textual content rich data collections as found on the Web like HTML web pages, in office productivity document files (DOCs, PDFs) in corporate file systems, or in email messages that mostly are pure text files with complex attachments.

Siets Server is aimed to fill that gap and bring high-performance FTS solution which is easy to integrate with legacy systems using text based machine-readable data.

What is so special about FTS? When you want to search for units of data matching some criteria in your collection it is obvious that iterating through whole data collection and match every unit against your criteria would be inefficient.

Therefore some search supportive data structure should be built that is commonly called index.

If in case of relational DB indexes are usually small in comparison to whole database, then in case of data retrieval from text collections FTS index is comparable to the size of text collection itself.

Siets Server software has overcame slowness of the hard disk drives by using specially designed inverted index structures and RAM memory-caching algorithms that allow only sequential disk storage access that is significantly faster than random access disk seeks.

Siets Server Automatically Creates Inverted Index

Figure 3. Siets Server automatically builds inverted index for all data objects loaded in SIETS Storage.

Inverted index essentially contains pointers to all words in any text content, arranged for efficient search access. Construction of such index that is located on the hard disk storage can be very time consuming. This is so because of slow disk storage seeks, mostly mechanical operation with hard disks.

To shorten index construction time and make search queries fast enough for users, distributed data storage and computing architecture is required both for indexing and search software.

Finally, with millions and even billions of documents in very large data collection like in the online web search, a search engine needs to analyze language patterns and context in free text queries of users, finding not all matches (that can easily hit many millions of results with very common words in natural languages), but only the best matches to the query context.

Search engine is recommending the most relevant 10, 50, 100 results upfront, to protect user from information overload with irrelevant hits, and to avoid the need to transfer the whole result set with possibly millions of all hits from the back-end server to client web browser over slow Internet connection.

Implementing efficient FTS search engine platform software code for fast and instantly relevant search in massive data volumes of text-rich (unstructured) data is not a trivial task. Siets Server software was written in C to deliver the best performance.

Siets Server operates as a high-performance distributed XML data store and very fast linguistic search driven recommendation engine system for web application model.

Using unique information relevance ranking method, Siets Server retrieves the most relevant XML documents by analyzing the language terms in free text queries of users and performing blazing-fast linguistic pattern matching (with word stemming and synonym support) against XML database content, and then surfacing the most relevant TOP N results over Siets API to customer applications.

Distributed XML data store that scales out nearly linearly

Siets Server is aimed to be easily integrated into existing environments adding FTS capability to legacy systems through XML data store and XML-based API.

XML is a machine-readable document standard created to facilitate information cross-platform exchange.

XML is also widely used for data and document storage and exchange across different vendor products.

XML based application programming interface (API) was the best choice for Siets Server, as it can be integrated with nearly any other computing platform, operating system environment, programming language etc.

Siets Server Uses Simple XML messaging for API

Figure 4. Siets Server uses Web services-friendly XML messaging (Siets API) and network-friendly http protocol to communicate with clients.

To go even further and avoid unnecessary conversions Siets uses XML not only for API but for data storage and retrieval as well.

The real power of XML-based data storage and retrieval can be experienced by learning the fact that XML documents of any structure can be stored on Siets server without any prior schema or structure definitions.

Later this arbitrary structure or data model can be flexibly adjusted and changed for fine-grain search needs, even performing on-the-fly changes in XML document structure per selected document level, without rewriting application software or reloading data in different format.

It can save a lot of time and money for application development compared to SQL based data stores because it avoids much of unnecessary modeling and painful migration required for structural changes of tabular SQL data and related software.

Siets Server Stores and Retrieves all documents in XML

Figure 5. All data objects are stored / retrieved as simple XML documents on Siets Server.

Enterprise grade centralized web GUI management

There are 4 key qualities for enterprise data processing software solutions used to store and retrieve information:

  • Search speed
  • Indexing speed
  • Real-time index modification
  • Reliability

It is possible to find FTS tools on the market that address 3 of those 4 qualities, while all 4 qualities in one product is very difficult to achieve.

In modern search applications a construction of the full text index is very complex task. It has performance and reliability constrains imposed by hardware and user needs.

Enterprise grade applications require data retrieval solution that fulfill all of these requirements.

Siets server is designed to achieve all of those four qualities, having smart optimizations and complex caching algorithms.

On top of that Siets technology customers are provided with the management tool needed to administer Siets sever in corporate environments. This tool allows centralized management of all Siets search servers across the corporate network. Authorized system administrators and developers can manage all their search storages through user friendly Web interface.

Siets Enterprise Manager For Cluster-wide Centralized Management And Administration

Figure 6. Siets Enterprise Manager is a Web server application for centralized management of all Siets Server instances (searchable document collections) across the corporate network.

Clustering and multi-copy replication included

Today industry customers are expecting high availability, scalable capacity and high performance from enterprise grade software products.

In many use cases, in particular, in online web search applications, a fleet of identical by hardware low-end servers with fast local storage is often much more economic solution than maintaining a high-performance cutting-edge server with powerful disk arrays.

To deliver equivalent computing power at the fraction of cost of high-end server hardware based solutions, Siets Server software was designed to work in cluster environment made from inexpensive commodity PC hardware from day one.

Siets Enterprise Manager

Figure 7. Siets Server can run as distributed database providing fast search in a very large data set split among many computers.

For users of Siets Server hardware upgrades becomes very simple: if the total size of a data collection grows, just adding another basic capacity hardware server to the existing one-server initial setup, will easily duplicate data storage and computing capacity without any downtime of the operational production system.

High availability, workload sharing, fault-tolerance

Siets Server support multiple active full database and search index replicas (called 'mirrors') for any data collection in a cluster for applications that need high-level of information access redundancy.

Replicas can serve as high-availability redundant online copies for your XML data store. Siets Server was designed for zero-downtime operations when running multiple replica copies.

Multiple additional mirror replicas can be created for each data store shard to scale search performance in high workload environments.

Available XML data store replicas are automatically used for search query performance load balancing and for fault-tolerance by Siets Server.

If any replica hardware fails and needs to be replaced, Siets Server will automatically use the rest of replicas still available in a cluster.

Replicas can be taken offline for maintenance and then re-synchronized by administrators either automatically or manually with online production system data.

One software platform: better performance, safety and TCO

By choosing Siets Server for your search needs you can deploy three technologies at once, all in one fast C/C++ software platform.

Siets software delivers high-performance full-text search (FTS) with immediate search index updates along data store updates.

The platform is very easy to integrate in legacy environment because of its XML data store server architecture.

Siets Server provides scalability for data volumes and for performance due to its native clustering capability.

This approach to operate only one data processing platform is inherently also more secure than running multiple integrated platforms.

Less number of IT software platforms proportionally reduces your exposure risk to potential malware or software bugs in each system.

The total cost of ownership (TCO) savings can be substantial and around 75% compared to SQL-driven legacy solutions.

For more customer benefits please visit section: Solutions

Unique method of scalable relevance ranking

Siets Server uses a unique relevance ranking method that is based on the training of a set of weighting rules on the particular XML data store model indexing scheme. Each Siets Server data storage can be trained for the most optimal (from a human point of view) full text index weighting rules for the maximum desired contextual relevancy of search results within the particular XML data model.

Application owner can adjust weightings and by performing free text search queries, can decide when the XML data store ranking model is considered well-trained.

The last set of adjusted index weightings is automatically being applied as a default indexing policy (indexing sorting and grouping rules) by Siets Server over the entire full text index, when performing next XML data store updates, deletes or reindexing.

A well-trained and well-ranked Siets Server data store (called 'storage') is scalable to Petabytes of searchable data in a massively clustered IT infrastructure without degrading in search query response time in web applications.

Ranked full content index on all data

Siets Server automatically builds the full text index for any data in customer own XML format during XML document inserts, updates and deletes.

Unlike other search engines such as Lucene using a single vector type of search results scoring, relevance ranking in Siets Server is organized as a multi-dimensional vector matrix organizing a pre-sorted inverted index on all data.

This unique ranking method enables Siets customers to combine multiple independent or overlapping relevance dimensions at once to flexibly satisfy literally gazillion of different search application needs for results ranking, sorting, grouping and ordering in web and mobile applications.

In the classic data processing on database tables when performing text search by using SQL language, a query results in SQL SELECT clause are typically organized by custom combination of ORDER BY, GROUP BY, LIKE and nested JOINs directives in SQL. It quickly gets very complicated by syntax. An SQL query, when executed by SQL Server, often results into dramatic performance degradation due to excessive multi-level sorting, joining and filtering of data from multiple tables.

Multi-dimensional ranking for XML data model supported by Siets Server enables to avoid most if not all of SQL complexity for similar types of queries.

There is even one more great benefit: Siets customers can store their previous SQL table data on Siets Server in fully or partially denormalized format (just a main table without relations to external tables for codifiers). When replacing all relational primary key and secondary key codes with plain text values in the new XML "single-table data model", it makes XML data model database fully searchable at lightning-fast speed. Every data object is fully self-contained XML document. When Siets XML database storage is indexed, everything becomes searchable by combined free text and XML markup structure using only plain language in query terms. All sorting and grouping for search relevance will be done by Siets Server according to the pre-sorted ranked inverted index matrix and without the need to program and execute complex SQL statements.

Ultimately end-users can start querying very large XML databases in Siets platform just by few plain text words or text phrases in natural language, while Siets Server will respond to user search queries with instantly relevant TOP 10, 25, 50, 100 etc contextual results in descending ranking order.

This querying and search sorting of Siets Server XML databases becomes default behavior for properly ranked databases, and all web applications accessing the database from different clients, automatically are being serviced from Sets Server with the same uniform search results sorting and ordering rules.

This characteristic is critically important for data processing in big data volumes, preventing efficient use of SQL types VIEWs at scale due to adverse performance effects when SQL tabular data must be heavily filtered per each query.

In contrast Siets Server performance with multi-faceted data grouping and sorting in the query result set does not degrade with data volume increase since information sorting and grouping for human desired relevance model was already written into the inverted index matrix of rankings.

To learn about all Siets software features, visit section: Features

Siets software licensing overview

Siets software is distributed and can be used under commercial or free use licence terms.

Customer benefits licensing Siets software

  • Fully functional software in all download packages
  • No maximum data volume or disk storage limitations
  • No maximum RAM size restrictions per server
  • Any number of CPU processors or cores per server
  • Perpetual commercial licence does not expire

By using Siets software you agree to licence terms described in Siets End User Licence Agreement (Siets EULA).

Read Siets software licence terms here: Siets EULA

Siets Free Public Licence

You can also license Siets Software under Siets Free Public Licence terms for non-commercial use purposes in exchange of a Siets Reference Link:

Siets Server: Scalable Enterprise Search Engine And Crawler Software

Powered by Siets search engine

on your organization public Web site.

Siets Free Public Licence allows to index and search up to 20 000 documents.

3o-days free use for evaluation

Siets Server Commercial Licence software can be downloaded, installed and used free of charge for evaluation purposes under Siets End User Licence Agreement section "Evaluation Versions" licensing terms: limited to 30-days free use.

After evaluation period of 30-days, Siets Server Commercial License must be purchased or other type of licence must be obtained from Siets.net to continue Siets software use.

Siets software without legal or expired licence terms must be promptly deleted from computers.

Siets Software Licence Types

Siets software licensing is flexible, allowing for perpetual, subscription, time-limited evaluation and free licences.

The following licences are offered:

  • Siets Server Commercial Licence (per server)
  • Siets Free Public Licence (up to 20 000 documents per site)
  • Siets Server OEM Licence (for partners bundling into own products)
  • Siets Global Internet Crawler Licence (per site)
  • Siets evaluation version usage terms (time limited to 30-days)

Siets Server licence includes free Siets Crawler (SC) and Siets Enterprise Manager (SEM) usage rights.

Siets Server Commercial Licence also includes usage rights for 2 Microsoft Windows utilities that work with Siets Server:

  • Siets Loader - loading of MS Outlook email files (*.EML) into Siets database over REST API
  • Siets Slideline - sticky news headlines utility, showing RSS and other content changes online

Siets software commercial customers can obtain either perpetual licence or could opt to use Siets software commercially based on monthly subscription (software rental).

Licence Type Period Support SEM Crawler Loader Slideline
Siets Server Commercial Licence perpetual email, phone Yes Yes Yes Yes
Siets Free Public Licence perpetual No Yes Yes No No
Siets Server OEM Licence perpetual email, phone Yes No No No
Siets Evaluation Version 30-days No Yes Yes No No

Siets Server commercial software generally is licensed per deployment on a hardware or virtual server.

General rule of thumb: same number of licenses as hardware or virtual servers that are running their own copy of Siets Server software in RAM.

No expiration time or other restrictive features were built into the software.

Siets Global Internet Crawler Licence

Siets Global Internet Crawler is an advanced Web spider software solution, requiring non-trivial distributed computing setup on many servers.

Configuration of distributed Siets software deployment depends on customer requirements and size of underlying IT infrastructure, that could have changing number of servers.

That's why site-wide licensing terms are usually offered to Siets Global Internet Crawler customers.

A site usually is a fleet of customer servers in one location, to be installed with Siets distributed Internet Crawler version software for high performance spidering of thousands of Internet domains data from multiple hardware servers in parallel.

To store all collected by Siets Global Internet Crawler web pages, that could result into a massive TB or even PB size data volume, Siets Server software must also be deployed in similar way: creating a distributed search index architecture database for Web search.

Finally, a custom search application portal (front-end for end-users) should be built, with load-balancing using reverse proxy tools and probably also replication tools in several locations.

A web search service normally requires 365/7/24 availability and could grow to millions of users daily very fast. This scalability factor needs to be provided in both hardware and software setup for a reliable web search service expected by users.

Siets.net could provide professional IT engineering services to set up all of the above on behalf of the customer, along properly configuring all Siets software and its site-wide licensing for the customer.

Siets licensing to schools and universities

For universities and colleges free or substantially discounted licenses of Siets Server software can be issued for training and educational purposes.

Siets.net will evaluate each application and will grant the license for particular training and education programs.

Siets Server OEM Licence

Special conditions are available for OEM manufacturers who want to include Siets Server software with their own hardware products or packaged software solutions.

Typically volume discounts based licensing from number of products sold in the range of 5% from the end-user product price and less.

Different volume discounts can be negotiated and agreed for each particular product.

Siets Server OEM software under Siets licence Evaluation Version terms can be used in free trial OEM products for free product testing and evaluation for the same time-limited 30-days period.

Easy Siets software licensing upgrades

In order to upgrade from Siets software evaluation version use to legal licence use, reinstall or update procedure of the downloaded evaluation software version is not required.

The license purchase documents from Siets.net and Siets customer billing records can serve as a proof of legally obtained Siets license.

Technical support service

Technical support service is provided to registered Siets.net customers with working email address.

Registered Siets software customers are all commercial customers who made licensing purchases or subscriptions.

Siets free and evaluation software licence customers are welcome to ask about pro bono support options.

Read more about Siets support team: About

Siets Software Download Resources

Siets software can be downloaded from Siets.net or from other Web sites, legally distributing Siets software on the Internet.

Each Siets Server download package is bundled with Siets Crawler (SC) and Siets Enterprise Manager (SEM) application, there is no need to license this software separately.

Download Siets Server software here: Download