SIETS Products:

Search Engine, Crawler, Global Internet Spider etc

Siets is licensed as commercial software:

Siets Server

The main product: a scalable and powerful search and indexing engine, that operates as a client-server database with an easy to use REST based Siets API.

( click to learn more )

Siets Enterprise Manager

Web GUI based configuration and management tool to centrally run and administer Siets software as a cluster-wide search engine platform, when Siets Server software is installed on a fleet of hardware servers.

( click to learn more )

Siets Crawler

Website and file system crawler with an automatic tasks scheduler to spider and crawl corporate networks and intranets.

( click to learn more )

Siets Global Internet Crawler (Web Spider)

Advanced cluster spidering software automatically discovers and follows Web links, collects and indexes web content globally or top-domain specific, by specified IP address or zone range. Supports incremental spidering for updates only, bandwidth management and can crawl thousands of domains in parallel.

( click to learn more )

Siets Search Appliance

Siets Server, Siets Enterprise Manager and Siets Crawler with pre-installed RedHat Linux OS setup, all implemented as ISO image of "virtual out-of-the-box search appliance": one click to download, install and use.

( click to learn more )

Siets Loader

Windows OS utility scanning PC and bulk loading Outloook *.EML data format email messages for indexing and search on Siets Server. Operates as a Windows agent utility (EXE file), that can be manually started or scheduled to run on a user PC at specific times.

( click to learn more )

Siets Slideline

Windows OS utility showing always-on 'sticky'-news headlines on a PC user screen bottomline, in background tapping Siets Server content changes through alerting triggers and web RSS sources, and offering clickable web titles sliding across the news-bar.

( click to learn more )

Partnership program

Siets software can be distributed through partners, increasing partner company margins from hardware and IT services powered by Siets software. Siets software is licensed to:

  • distributors (per country or per region)
  • system integrators
  • OEM

Read more about Siets software licensing: Overview / Licensing

Technical support

In addition to licensing of the above 7 products, Siets engineering team provides professional technical support, training, consulting, onboarding, custom software development and other services related to Siets software.

Read more about Siets team and history: About

Siets Server

The main platform product developed in C/C++ code and available for 32-bit processor architecture computers.

Siets
Server

Originally designed software

Unique and fast C source code

Client-server XML database

Built-in full text search engine

Open API using HTTP/HTTPS

Distributed architecture

Multiprocessing, multi-threading

In-memory RAM data caching

Siets engineering team combined those design principles into a single software stack product, resulting into fast and efficient code. Software 32-bit OS version was under its active development and production in 2001-2006. For its next 64-bit OS platform version please kindly visit www.scalingsearch.com product website.

Siets Server automatically builds the full text index for all loaded XML data, as well as can scale the database out in a fleet of cluster servers to accommodate big data volumes beyond capacity of a single hardware server to process.

Use Siets Server software to develop state-of-the-art search engine based application solutions that can be operated in heavy-duty workload environments providing web and mobile Internet services on large amounts of data at high speed.

Learn more about Siets platform here: Overview


Main features

Access powerful search functionality from your favorite programming language.

Enjoy more than 50 indexing and search options, multi-language support (over 160 languages) and cost-efficient one-platform TCO, when use of Siets Server software delivers the most out of it.

The platform provides advanced full text search functions, such as, among many, word and phrase search, Boolean search, use of wild cards in queries, proximity search, 'Did you mean that?' - type of spell checking, generating of text snippets for hits, and many other basic search engine functions.

Learn about more features here: Features


Use cases

Siets Server probably is one of the fastest search engines on the market, delivering very high search speeds with full text search functionality and flexibly tunable relevance ranking in large amount of data.

You can also use Siets Server as a platform independent XML database to store your non-relational information as XML documents or imported data from SQL databases in XML format.

Please see other use cases here: Solutions


System requirements

Siets Server source code was written in C programming language, using x86 processor 32-bit architecture and Linux operating system platform for the product development and testing.

Siets Server software was designed to be portable across operating systems and may be ported on other processors and operating systems.


Download and installation packages

Installation packages with binaries have been made available for download for several of the most popular Linux OS distributions: RedHat, Debian, Slackware etc.

Software could be downloaded from Siets.net website, installed and administered by IT specialists with basic knowledge of Linux OS.

Siets Server download packages include Siets Enterprise Manager and Siets Crawler applications bundled within the same installation package.

Download Siets Server software here: Download

Siets Enterprise Manager

SIETS Enterprise manager (SEM) is a web GUI based configuration, administration and management application for SIETS software written in PHP.

Siets
Enterprise
Manager

Cluster-wide one-click administration

Centrally manage all SIETS Server instances

Schedule centrally SIETS Crawler tasks

Create, configure, manage multiple storages

Manage search indexing policy rankings

Monitor cluster-wide resource usage

Debug, test, evaluate API calls CLI-like

Role-based password-protected user accounts

View access, error and audit logs

SEM was designed to centrally run and administer Siets Server and Siets Crawler software in corporate network.


Main features

SEM can be used to administer only a single server installed with Siets Server or the entire cluster-wide setup, where Siets Server and Siets Crawler are installed and operated on a fleet of multiple hardware servers, supporting data sharding and replication.

Customers can execute Siets API commands from built-in command-line user interface directly from SEM, without need of programming for Siets Server. This mode is useful for software performance evaluation, debugging etc.

Securely manage SEM administrators, establish and change access rights for users of specific Siets storages.


System requirements

SEM needs to be installed as an PHP application on a web server such as Apache.

SEM can be downloaded from Siets.net and is included in Linux OS installation packages accompanying Siets Server downloads.

Generally SEM software modules written in PHP could be installed and run also under Windows OS or other OSes, and could be made to work together with other web servers than Apache. We have not yet tested SEM operations under other configurations than Linux OS and Apache.

Learn about more SEM features here: Features

Siets Crawler

Siets Crawler (SC) is website and file system crawler platform application written in C that is compatible with Siets Server search engine platform and Siets Enterprise Manager platform. It enables to collect data from corporate network for fast search.

SC comes with automatic task scheduler for spidering (crawling) corporate networks and intranets collecting all files and web pages found.

Siets
Crawler

Crawling of specified web domains and file systems

Centrally managed cluster-wide crawler configuration

HTML, DOC, RTF, PDF, XML, XSL, HTML, TXT files

Continuous crawling after link interruptions

Regular-period or time-activated crawling

Indexing full-text, numeric and date values

Spidering web-password protected and SSL sites

Use of regular expressions as crawler filters

Restriction of number of pages per domain

Limiting sublevel of spidering url-s

Language etc detection and meta tagging

SC can be managed through Siets Enterprise Manager, an easy to use that enables system administrators or Web masters to crawl and index multiple corporate Web sites or file servers.


Main features

Use SC to collect data in most popular file formats from corporate public web site, intranet servers or file system, and index the data into one or several Siets Server storages for instant full text search, partitioning security and access rights to storages on need-to-know basis.

SC and Siets Server supports indexing of data more than 160-languages in the same single Siets database. Queries can be executed in any content language of interest, if data was collected and is present in that language.

Any files with character strings separated by customized 'separator' characters can be indexed for fast search using SC.


System requirements

SC management GUI software is written in PHP and is included into Siets Enterprise Manager installation package and is part of the Siets software download package.

SC modules written in C also comes with Siets Server software installation package bundled.

SEM application is used to invoke SC management GUI modules and then perform scheduling and management of SC binary code modules of crawler software.

SC comes with a sample template plug-in search module written in PHP for web server application, that enables to customize SC collected search results with sample code and XSLT.

SIETS Crawler software installation service for a distributed spider system is provided upon request and requires custom installation in a particular network and pre-planned hardware setup in a cluster.

Learn about more Siets Crawler features here: Features

Siets Global Internet Crawler (Web Spider)

Siets Global Internet Crawler (SGIC) is an advanced cluster spidering software written in C that automatically discovers and follows Web links, collects and indexes web content globally.

Siets
Global
Internet
Crawler

Includes all features of Siets Crawler (see above)

Crawling per top-domain and per sub-domains

Crawling by country-wide IP address zones

Distributed crawling 1000s of domains in parallel

Full crawl or incremental updates for changes only

Centrally managed cluster-wide crawler configuration

Multiple parallel SC processes on many servers

Multi-threading to crawl many domains per server

Partitions data and index on many cluster nodes

Crawls into separate indexes operated as replicas

Accommodates to bandwidth and network topologies

Calculates page ranking for web page relevance

Eliminate domains with duplicate data content

Domain grouping, duplicate results removal

Adds page size, language, crawl date as meta data

Gathers and indexes href content for web pages

Option to follow robots.txt rules or skip rules

Limit bandwidth use, set network timeouts

Limit number of parallel crawler processes

Supports incremental spidering for updates only, bandwidth management and can crawl thousands of domains in parallel.

Customers can build country-wide Internet search engines or set up global search engines in specific business sectors.


Main features

SGIC is an advanced state-of-the-art application for automatic Web link discovery and spidering in the whole global or national Internet, and for building massive volume of clustered index for large scale Internet search engine configurations.

This Web spidering software that automatically indexes all collected content for search using Siets Server as the backend engine.

Can work with separate replica sets: one for production use in search portals, while another one is being crawled and updated.

SGIC also stores original documents for later retrieval from SGIC cache Saved versions of the original files could be used by the applications to view cached version of Web documents when the source server is not available.

SGIC can also be used with a third-party file conversion filtering software to index content and meta data for more that 370 different file formats that cover almost all existing specialty formats and legacy formats.

The key advantage of using SGIC for running a country-wide or global Internet search engines is less demanding hardware resources. Customer will need less servers to operate country-wide Internet search engine.


System requirements

Building and operating large scale country-size or global Internet search engines is still as much as art as a technical engineering job.

Using SGIC, customer would need to set up SGIC software in a customer data center environment, on a custom hardware configuration, in specific cluster network topology determined by the challenge size, by performance and scalability requirements of the application and other customer business needs.

For operating SGIC software customer would need to configure all domain names and IP address zones for SGIC crawler, establish index updating policy and desired frequency, indexing data volume and content download limits etc.

SIETS Crawler software installation service for a distributed spider system is provided upon request and requires custom installation in a particular network and pre-planned hardware setup in a cluster.

SGIC software can not be downloaded due to its custom IT infrastructure dependency.

Siets Search Appliance

Siets Server powered Siets Search Appliance (SSA) software could be packaged as a downloadable search appliance software, containing pre-installed operating system, Siets Server, Siets Crawler and Siets Enterprise Manager.

SSA installation software is simply an CD ISO image file with Linux pre-installed, with all necessary Siets Server software pre-installed and with user-friendly installation script.

Siets
Search
Appliance

Downloadable ISO image with Linux OS

Siets Server included

Siets Crawler included

Siets Enterprise Manager included

RedHat Linux OS pre-installed

Install on any custom hardware

Rack and stack to scale out

Unlimited number of documents

Unlimited number of collections

Customers can download this ISO image file with SSA software, burn it on a CD, boot from it and then install complete 'search appliance' setup on any hardware of choice, creating their own "out-of-the-box" search appliance.

All software components in SSA can be managed just by using standard web browser, through Siets Enterprise Manager.

Customer benefits

Main customer benefit is free choice of own hardware for search appliance solution. Customer do not need to wait for hardware procurement, order fulfillment, shipping and delivery done, if some hardware is already available.

Hardware platform for SSA can be easily upgraded to more powerful one accommodating future data and traffic growth needs without expensive dependency on the particular search appliance hardware supplier overcharging for hardware just for its brand name.


Competitive search appliance products on the market

Table below compares SSA with other vendors products which have been published in vendors websites or other sources.

Most of them limit number of documents you can index and search per appliance. SSA does not limit: index and search as many documents as your hardware capacity allows.

Vendor Product Siets Search Appliance + customer hardware Google Mini Thunderstone Small Business Edition Thunderstone APP 250 Google Search Appliance GB-1001 Google Search Appliance GB-5005 Google Search Appliance GB-8008
Sample pricing of vendors $2 500 $2 995 $4 995 $10 000 $30 000 $50 000 $80 000
Maximum number of documents indexed Unlimited 100 000 50 000 6 million 500 000 2 million 15 million
Number of collections Unlimited 1 Unlimited Unlimited Unlimited Unlimited Unlimited
Number of queries per minute 1500 (HDD)

15000
(in-memory)
50 1000 2500 300 300 1000
Target customer SMB, SME, enterprise SMB SMB SMB, SME SME enterprise enterprise
Ability to index XML meta data Yes No No No No No No
Continuous crawling Yes No Yes Yes Yes Yes Yes
Support options Email, Online, Forum, Phone, Skype Email, Online, Forum Email, Online, Forum, Phone Email, Online Forum, Phone Email, Online Forum Email, Online Forum Email, Online Forum
Hardware support, years customer own h/w 1 2 2 2 2 2
License type Perpetual Perpetual Perpetual Perpetual 2 year 2 year 2 year
Hardware form factor User selected 1 U Rack Mount 1 U Rack Mount 1 U Rack Mount 2 U Rack Mount Multi Rack Included Multi Rack Included

In typical scenario of indexing a corporate Web site and intranet, for example, up to 500,000 documents, by using SSA one can save $27 500,- compared to Google Search Appliance model GB-1001, or $7500 if one selects a less pricy Thunderstone model APP250 version.

Even when subtracting a hardware server cost for about $3,000,-, ROI from software-only Siets Search Appliance use will still be $24,500 vs Google and $4,500 vs Thunderstone appliances.


Comparison of search appliance performance benchmarks

Vendors of search appliance products usually disclose maximum performance figures for their search appliance solutions, when software is tuned for the maximum performance, eg, by caching data mostly in RAM.

In contrast, SSA, being a software-only search appliance, can be configured to use only RAM for maximum performance, increasing performance by order of magnitude compared to pre-configured hardware products.

The actual search performance is determined by many factors - the total size of the database, the number of terms in a search query, average size of a single document etc. Sometimes vendors quote just abstract figure 'number of queries per minute'.

There could be a few orders of magnitude difference in performance benchmarks if search queries require mechanical hard disk storage access or data can be just RAM cashed.

For large size databases much slower hard disk mechanical reading operations will slow down search response times to 0.1-0.2 seconds per query.

The actual performance is determined by the methods how the software system can cache and re-use portions of index in computer memory.

The size of free RAM is crucial and a lot of RAM is always better for performance. Benchmarking for total through output is also important factor, when many users simultaneously issue search queries and the search engine executes many of them in parallel.

Siets engine performance is well optimized for this type of performance improvement. The source code of Siets engine has been developed in C and optimized for memory-caching and memory threading into parallel processes and threads.

Siets Server can sustain about 1500 full text transactions per minute per single hardware server with mechanical hard disk storage in typical data center environment, with average response times 0.2 seconds for each of query.

Siets Server installed in cluster replication configuration by using mirrored parallel databases can increase the total through output of search queries nearly linearly proportionally to the number of mirrored database copies.

By splitting the data corpus in even smaller database parts (shards) stored on many low-cost servers Siets customer can benefit from complete in-memory caching of the each Siets index part in the local TAM with average response times below 0.005 seconds.

Siets Server as SSA core software engine performance benchmarks are public.


System requirements

SSA software can be downloaded for evaluation and testing on own hardware as RedHat Linux OS pre-installed Siets Server ISO image file.

There is no need to know Linux for running SSA as an enterprise search appliance. Linux operating system will be installed together with Siets Server, Apache web server, Siets Enterprise Manager and Siets Crawler software to manage the entire SSA solution.

By downloading and using SSA software in ISO image format, customer organizations can select their own hardware, which in many cases gives them much more flexibility compared to solutions which involves ordering and shipping pre-installed hardware based search appliance products.

Download ISO image with SIETS software here: Download

Siets Loader: utility for email import for search

Some useful tools and utilities for SIETS Server applications.

Load your PC mail folder content into centralized Siets storage for indexing and keeping email records.

In this section we provide additional end-user software tools for those users who would like to use Siets Server for storage and retrieval of corporate information accessing Siets Server databases from their PCs.

Application for importing e-mail messages to SIETS server. Currently only MS Outlook Express and Mozilla Thunderbird mail clients are supported.

This is Windows application to load your specified mail folders from your PC into centralized Siets engine database. Filtering of messages by specified interval of dates is supported too.

Note, that a Siets Server with 'mail' database storage has to be set up by system administrator prior to running this utility from your desktop PC. Web search form should be also installed (provided as a CGI script for a Web server in Siets installation package) to access and search your stored mails by this utility. To access your mail messages for search and retrieval you have to use Siets Server's built in user authorization and password (other users wouldn't be able to see your email data). Ask for your system administrator to configure your access rights and password on the Siets server storage 'mail' and to install and configure CGI utility for search.

Siets Slideline

SIETS Slideline is win32 architecture application for MS Windows OS that provides always-on news bar with sliding news headlines on your computer screen when using Windows OS.

Can be used to monitor latest Siets database updates along with optional user-customized RSS feeds from third-parties.

Software taps into the latest news headlines by analyzing content changes from SIETS server and from other RSS sources, using HTTP transport protocol. Internet connection is required for Slideline application to work properly.

You can personalize all data sources and display options as you need.


Sticky Headline News For MS Windows Users

Runs on Windows98, 2000 and XP. Unlike many Java or ActiveX based products Siets Slideline utility has one unique feature - it does not closes when you stop your browser.

SIETS Slideline is showing titles of sliding news and events one-line, using a "sticky" window on MS Windows desktop.

User can change Slideline window bar size, location and can configure his own selection of news sources.

By clicking on the news title of the user interest, Internet browser is activated in a new window, transferring user to the original data source and the web page where user can read the full content of news.


Pre-configured Slideline data sources

Users are offered the following news and events resources from SIETS Slideline as default information services:

  • Lursoft newspaper library NEWS.LV
  • Company Register news from UR.GOV.LV
  • Apollo.lv portal news
  • eSports.lv portal news
  • More than 25 latest articles from daily press and magazines
  • Latest tenders from State procurement office IUB.GOV.LV homepage
  • Latest donations to political parties from KNAB.GOV.LV homepage
  • What others are searching in Siets web search engine?
  • Weather news
  • Currency rates from Latvian Bank home page BANK.LV
  • Name days from Siets.lv calendar

SIETS Slideline information resources are being constantly updated by Lursoft, becoming available to everyone on Slideline. Latvian Court information news are coming soon.

In Slideline user interface all news resources are ordered according to their popularity by other Slideline users.


Slideline news in RSS-format

Slideline additionally offers to activate RSS news channels on the Internet.

RSS are syndicated news channels published in XML format by many popular news portals and services such as BBC, Wired, SlashDot etc.

Most of RSS news sources are in English, with only few sources available in Latvian Internet.

All Slideline users have an opportunity to add their own Slideline RSS data sources with interesting and useful information. By adding own news source to Slideline, these sources are being added to specific news categories by Slideline.


Community of user-shared ranked news

Added data resources by other users immediately become available to all SIETS Slideline users and will show up in Slideline application configuration window as new sources. Use can configure them as new data feeds if interested.

In SIETS Slideline database all news sources are being ranked by their popularity in Slideline user community.


Slideline privacy guarantees

SIETS Slideline application does not access user information on their PCs.

It does not do any other actions on user PCs not related to its direct functionality: to provide news feed from SIETS server powered content change alerting services or from other RSS data sources configured for Slideline as news source.


How to use Slideline

While application is running, you will have single line above all other windows displaying latest headline from sources, you have selected.

Line is located in the lower part of the desktop. It can be moved to any place. Next time, when application is started, it will appear in exactly the same place, as it was closed. Application can be moved by clicking and holding left mouse button and dragging to new location.

Left click on sliding headline will open full article in default web browser. If link is not available, nothing will be opened. To prevent any problems with multiple double-clicks, minimal time between double clicks is half-second.

Right-click on Slideline will open main menu.

You can perform following actions from main menu:

Show/Hide - Show Slideline, if it is not visible or hide it, if it is visible. This is default options, so this actions will be perform on double-click on application icon in icon tray area (near the clock).

Speed - You can change scrolling speed by using this sub-menu.

Size - You can change base size by using this menu. Each text item can be displayed a bit smaller or larger than base size.

Sources - convenient way to select sources.

Dialog has several tabs. Each of them contains a group of sources: "RSS sources" - RSS sources and "SIETS sources" - your SIETS server sources.

Largest part of each tab is occupied by sources, grouped in categories. Expand category to view and select sources in this category or to select color, time limit and count for each sources.

The is possibility, that all parameters cannot be set for some sources. It is normal situation when parameter does not have sense or sources does not have ability to change specific parameter.

Tab "RSS source" contains additional possibilities "Add new source..." - to add new RSS sources to list in any category and "Sources reload time..." - to choose sources reload time.

This is dialog for adding new RSS source. You have to only choose the category and enter the link, and if link will contain valid* RSS source, it will be added to list.

* - Application is made according to RSS 2.0 specification. There is no guarantee, that lower or higher versions will be supported, you at your own risk. Only items with titles will be displayed.

Choose reload frequency and press button OK.

If you think, there should be another category for RSS sources, please send e-mail to us with examples for the category and we will consider adding it.

"SIETS sources" tab contains "Edit servers and storages..." link. Click on it will open dialog "SIETS Servers and Storages". This dialog allows you to edit servers and storages, you would like to monitor. This dialog only edits storages. Selection of storages, to display latest items, and parameters if done via Sources dialog. Dialog contains two Verify buttons. One if for verifying server parameters, other is for verifying storage parameters. Press OK to save changes or Cancel to discard changes. Do not afraid to enter storage passwords. They (and all other SIETS data) are stored on disk using strong encryption, so without master password, no data can be read.

Two other links in Sources dialog for SIETS sources tab allows to select data refresh time and set or change master password.

Note. Once set, master password cannot be removed. To remove master password, for example after removing all SIETS servers and storages, simply delete siets.ssd file in application directory.

Buttons in lower left corner of Sources dialog will expand all, collapse all or expand categories with at least one selected source.

Auto hide - When this option is selected, Slideline will automatically hide, when there are no new titles available, and will appear, when new titles become available.

Highlight new titles - When this option is selected, new titles will be highlighted in bold.

Popup on new titles - When this option is selected, Slideline will display notification when new titles become available.

Proxy - This option will open proxy server settings dialog. By default application uses MS Internet Explorer parameters, that are automatically detected. Other options are "Do not use" and "Manually specify settings". Last options gives you opportunity to manually specify proxy server address and port. Address can be IP or DNS style. To apply proxy settings, application must be restarted.

Note. If you use MS Internet Explorer browser and see this page, then second option "Use MS Internet Explorer settings" will guarantee, that you will receive titles.

Help - Opens this page.

About - Open dialog with full application version.

Close - Closes application.

There can appear and disappear menu items between Size and Sources options. They are passed dynamically from server and cannot be considered as bugs.

While application is running, application icon is displayed in icon tray area.

Double click on this icon will call default main menu action - Show/Hide.
Left or right single click will display main menu.


System Requirements

SIETS Slideline software is implemented as 32-bit Windows OS executable (exe) file, that runs as the resident RAM process in Windows OS, that can be downloaded and started without installation.

Download SIETS Slideline utility as ZIP file here: Download (ZIP)

Installation setup file is also available, that will place the application icon on Windows OS desktop and will activate automatic Slideline startup along Windows OS restart.

Download SIETS Slideline Windows SETUP file here: Download (EXE)