Fixed emails being saved to a wrong file under query=everything; improved page saving process; fixed pages being saved not considering the actual setting; Added non-link resolving variation of FindPageLinks; Added query=archive functionality; working directory is now an actual working directory instead an executables directory

Dashboard: Stop|Resume worker pools work at runtime
Removed unnecessary results channel
19 changed files with 12927 additions and 472 deletions
--- a/.gitignore
+++ b/.gitignore
@ -8,3 +8,4 @@ wecr
 release/
 scraped/
 extracted_data.txt
 visit_queue.tmp
--- a/README.md
+++ b/README.md
@ -1,38 +1,91 @@
-# Wecr - simple web crawler 
+# Wecr - versatile WEb CRawler 
 ## Overview
-Just a simple HTML web spider with minimal dependencies. It is possible to search for pages with a text on them or for the text itself, extract images and save pages that satisfy the criteria along the way. 
+A simple HTML web spider with no dependencies. It is possible to search for pages with a text on them or for the text itself, extract images, video, audio and save pages that satisfy the criteria along the way. 
-## Configuration
+## Configuration Overview
-The flow of work fully depends on the configuration file. By default `conf.json` is used as a configuration file, but the name can be changed via `-conf` flag. The default configuration is embedded in the program so on the first launch or by simply deleting the file, a new `conf.json` will be created in the same directory as the executable itself unless the `-wDir` (working directory) flag is set to some other value. To see al available flags run `wecr -h`.
+The flow of work fully depends on the configuration file. By default `conf.json` is used as a configuration file, but the name can be changed via `-conf` flag. The default configuration is embedded in the program so on the first launch or by simply deleting the file, a new `conf.json` will be created in the working directory unless the `-wdir` (working directory) flag is set to some other value, in which case it has a bigger importance. To see all available flags run `wecr -h`.
 The configuration is split into different branches like `requests` (how requests are made, ie: request timeout, wait time, user agent), `logging` (use logs, output to a file), `save` (output file|directory, save pages or not) or `search` (use regexp, query string) each of which contain tweakable parameters. There are global ones as well such as `workers` (working threads that make requests in parallel) and `depth` (literally, how deep the recursive search should go). The names are simple and self-explanatory so no attribute-by-attribute explanation needed for most of them.
 The parsing starts from `initial_pages` and goes deeper while ignoring the pages on domains that are in `blacklisted_domains` or are NOT in `allowed_domains`. If all initial pages are happen to be on blacklisted domains or are not in the allowed list - the program will get stuck. It is important to note that `*_domains` should be specified with an existing scheme (ie: https://en.wikipedia.org). Subdomains and ports **matter**: `https://unbewohnte.su:3000/` and `https://unbewohnte.su/` are **different**.
 Previous versions stored the entire visit queue in memory, resulting in gigabytes of memory usage but as of `v0.2.4` it is possible to offload the queue to the persistent storage via `in_memory_visit_queue` option (`false` by default).
 You can change search `query` at **runtime** via web dashboard if `launch_dashboard` is set to `true`
 ### Search query
-There are some special `query` values:
+There are some special `query` values to control the flow of work:
 - `email` - tells wecr to scrape email addresses and output to `output_file`
 - `images` - find all images on pages and output to the corresponding directory in `output_dir` (**IMPORTANT**: set `content_fetch_timeout_ms` to `0` so the images (and other content below) load fully)
 - `videos` - find and fetch files that look like videos
 - `audio` - find and fetch files that look like audio
- `everything` - find and fetch images, audio and video
+- `documents` - find and fetch files that look like a document
 - `everything` - find and fetch images, audio, video, documents and email addresses
 - `archive` - no text to be searched, save every visited page
 When `is_regexp` is enabled, the `query` is treated as a regexp string (in Go "flavor") and pages will be scanned for matches that satisfy it.
-When `is_regexp` is enabled, the `query` is treated as a regexp string and pages will be scanned for matches that satisfy it.
+### Data Output
-### Output
+If the query is not something of special value, all text matches will be outputted to `found_text.json` file as separate continuous JSON objects in `output_dir`; if `save_pages` is set to `true` and|or `query` is set to `images`, `videos`, `audio`, etc. - the additional contents will be also put in the corresponding directories inside `output_dir`, which is neatly created in the working directory or, if `-wdir` flag is set - there. If `output_dir` is happened to be empty - contents will be outputted directly to the working directory.
-By default, if the query is not something of special values all the matches and other data will be outputted to `output.json` file as separate continuous JSON objects, but if `save_pages` is set to `true` and|or `query` is set to `images`, `videos`, `audio`, etc. - the additional contents will be put in the corresponding directories inside `output_dir`, which is neatly created by the executable's side.
+The output almost certainly contains some duplicates and is not easy to work with programmatically, so you can use `-extractData` with the output JSON file argument (like `found_text.json`, which is the default output file name for simple text searches) to extract the actual data, filter out the duplicates and put each entry on its new line in a new text file. 
 ## Build
 If you're on *nix - it's as easy as `make`.
-Otherwise - `go build` in the `src` directory to build `wecr`. 
+Otherwise - `go build` in the `src` directory to build `wecr`. No dependencies.
 ## Examples
 See [a page on my website](https://unbewohnte.su/wecr) for some basic examples.
 Dump of a basic configuration:
 ```json
 {
 	"search": {
 		"is_regexp": true,
 		"query": "(sequence to search)|(other sequence)"
 	},
 	"requests": {
 		"request_wait_timeout_ms": 2500,
 		"request_pause_ms": 100,
 		"content_fetch_timeout_ms": 0,
 		"user_agent": ""
 	},
 	"depth": 90,
 	"workers": 30,
 	"initial_pages": [
 		"https://en.wikipedia.org/wiki/Main_Page"
 	],
 	"allowed_domains": [
 		"https://en.wikipedia.org/"
 	],
 	"blacklisted_domains": [
 		""
 	],
 	"in_memory_visit_queue": false,
 	"web_dashboard": {
 		"launch_dashboard": true,
 		"port": 13370
 	},
 	"save": {
 		"output_dir": "scraped",
 		"save_pages": false
 	},
 	"logging": {
 		"output_logs": true,
 		"logs_file": "logs.log"
 	}
 }
 ```
 ## License
-AGPLv3
+wecr is distributed under AGPLv3 license
--- a/src/config/config.go
+++ b/src/config/config.go
@ -29,7 +29,9 @@ const (
 	QueryVideos     string = "videos"
 	QueryAudio      string = "audio"
 	QueryEmail      string = "email"
 	QueryDocuments  string = "documents"
 	QueryEverything string = "everything"
 	QueryArchive    string = "archive"
 )
 const (
@ -37,6 +39,7 @@ const (
 	SaveImagesDir    string = "images"
 	SaveVideosDir    string = "videos"
 	SaveAudioDir     string = "audio"
 	SaveDocumentsDir string = "documents"
 )
 type Search struct {
@ -46,7 +49,6 @@ type Search struct {
 type Save struct {
 	OutputDir string `json:"output_dir"`
 	OutputFile string `json:"output_file"`
 	SavePages bool   `json:"save_pages"`
 }
@ -62,6 +64,11 @@ type Logging struct {
 	LogsFile   string `json:"logs_file"`
 }
 type WebDashboard struct {
 	UseDashboard bool   `json:"launch_dashboard"`
 	Port         uint16 `json:"port"`
 }
 // Configuration file structure
 type Conf struct {
 	Search             Search       `json:"search"`
@ -71,6 +78,8 @@ type Conf struct {
 	InitialPages       []string     `json:"initial_pages"`
 	AllowedDomains     []string     `json:"allowed_domains"`
 	BlacklistedDomains []string     `json:"blacklisted_domains"`
 	InMemoryVisitQueue bool         `json:"in_memory_visit_queue"`
 	Dashboard          WebDashboard `json:"web_dashboard"`
 	Save               Save         `json:"save"`
 	Logging            Logging      `json:"logging"`
 }
@ -85,7 +94,6 @@ func Default() *Conf {
 		Save: Save{
 			OutputDir: "scraped",
 			SavePages: false,
 			OutputFile: "scraped.json",
 		},
 		Requests: Requests{
 			UserAgent:             "",
@ -98,6 +106,11 @@ func Default() *Conf {
 		Workers:            20,
 		AllowedDomains:     []string{""},
 		BlacklistedDomains: []string{""},
 		InMemoryVisitQueue: false,
 		Dashboard: WebDashboard{
 			UseDashboard: true,
 			Port:         13370,
 		},
 		Logging: Logging{
 			OutputLogs: true,
 			LogsFile:   "logs.log",
--- a/src/dashboard/dashboard.go
+++ b/src/dashboard/dashboard.go
@ -0,0 +1,161 @@
 /*
 	Wecr - crawl the web for data
 	Copyright (C) 2023 Kasyanov Nikolay Alexeyevich (Unbewohnte)
 	This program is free software: you can redistribute it and/or modify
 	it under the terms of the GNU Affero General Public License as published by
 	the Free Software Foundation, either version 3 of the License, or
 	(at your option) any later version.
 	This program is distributed in the hope that it will be useful,
 	but WITHOUT ANY WARRANTY; without even the implied warranty of
 	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 	GNU Affero General Public License for more details.
 	You should have received a copy of the GNU Affero General Public License
 	along with this program.  If not, see <https://www.gnu.org/licenses/>.
 */
 package dashboard
 import (
 	"embed"
 	"encoding/json"
 	"fmt"
 	"html/template"
 	"io"
 	"io/fs"
 	"net/http"
 	"unbewohnte/wecr/config"
 	"unbewohnte/wecr/logger"
 	"unbewohnte/wecr/worker"
 )
 type Dashboard struct {
 	Server *http.Server
 }
 //go:embed res
 var resFS embed.FS
 type PageData struct {
 	Conf  config.Conf
 	Stats worker.Statistics
 }
 type PoolStop struct {
 	Stop bool `json:"stop"`
 }
 func NewDashboard(port uint16, webConf *config.Conf, pool *worker.Pool) *Dashboard {
 	mux := http.NewServeMux()
 	res, err := fs.Sub(resFS, "res")
 	if err != nil {
 		logger.Error("Failed to Sub embedded dashboard FS: %s", err)
 		return nil
 	}
 	mux.Handle("/static/", http.FileServer(http.FS(res)))
 	mux.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
 		template, err := template.ParseFS(res, "*.html")
 		if err != nil {
 			logger.Error("Failed to parse embedded dashboard FS: %s", err)
 			return
 		}
 		template.ExecuteTemplate(w, "index.html", nil)
 	})
 	mux.HandleFunc("/stop", func(w http.ResponseWriter, req *http.Request) {
 		var stop PoolStop
 		requestBody, err := io.ReadAll(req.Body)
 		if err != nil {
 			http.Error(w, "Failed to read request body", http.StatusInternalServerError)
 			logger.Error("Failed to read stop|resume signal from dashboard request: %s", err)
 			return
 		}
 		defer req.Body.Close()
 		err = json.Unmarshal(requestBody, &stop)
 		if err != nil {
 			http.Error(w, "Failed to unmarshal stop|resume signal", http.StatusInternalServerError)
 			logger.Error("Failed to unmarshal stop|resume signal from dashboard UI: %s", err)
 			return
 		}
 		if stop.Stop {
 			// stop worker pool
 			pool.Stop()
 			logger.Info("Stopped worker pool via request from dashboard")
 		} else {
 			// resume work
 			pool.Work()
 			logger.Info("Resumed work via request from dashboard")
 		}
 	})
 	mux.HandleFunc("/stats", func(w http.ResponseWriter, req *http.Request) {
 		jsonStats, err := json.MarshalIndent(pool.Stats, "", " ")
 		if err != nil {
 			http.Error(w, "Failed to marshal statistics", http.StatusInternalServerError)
 			logger.Error("Failed to marshal stats to send to the dashboard: %s", err)
 			return
 		}
 		w.Header().Add("Content-type", "application/json")
 		w.Write(jsonStats)
 	})
 	mux.HandleFunc("/conf", func(w http.ResponseWriter, req *http.Request) {
 		switch req.Method {
 		case http.MethodPost:
 			var newConfig config.Conf
 			defer req.Body.Close()
 			newConfigData, err := io.ReadAll(req.Body)
 			if err != nil {
 				http.Error(w, "Failed to read request body", http.StatusInternalServerError)
 				logger.Error("Failed to read new configuration from dashboard request: %s", err)
 				return
 			}
 			err = json.Unmarshal(newConfigData, &newConfig)
 			if err != nil {
 				http.Error(w, "Failed to unmarshal new configuration", http.StatusInternalServerError)
 				logger.Error("Failed to unmarshal new configuration from dashboard UI: %s", err)
 				return
 			}
 			// DO NOT blindly replace global configuration. Manually check and replace values
 			webConf.Search.IsRegexp = newConfig.Search.IsRegexp
 			if len(newConfig.Search.Query) != 0 {
 				webConf.Search.Query = newConfig.Search.Query
 			}
 			webConf.Logging.OutputLogs = newConfig.Logging.OutputLogs
 		default:
 			jsonConf, err := json.MarshalIndent(webConf, "", " ")
 			if err != nil {
 				http.Error(w, "Failed to marshal configuration", http.StatusInternalServerError)
 				logger.Error("Failed to marshal current configuration to send to the dashboard UI: %s", err)
 				return
 			}
 			w.Header().Add("Content-type", "application/json")
 			w.Write(jsonConf)
 		}
 	})
 	server := &http.Server{
 		Addr:    fmt.Sprintf(":%d", port),
 		Handler: mux,
 	}
 	return &Dashboard{
 		Server: server,
 	}
 }
 func (board *Dashboard) Launch() error {
 	return board.Server.ListenAndServe()
 }
--- a/src/dashboard/res/index.html
+++ b/src/dashboard/res/index.html
@ -0,0 +1,223 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="utf-8">
    <title>Wecr dashboard</title>
    <!-- <link rel="icon" href="/static/icon.png"> -->
    <link rel="stylesheet" href="/static/bootstrap.css">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
 </head>
 <body class="d-flex flex-column h-100">
    <div class="container">
        <header class="d-flex flex-wrap justify-content-center py-3 mb-4 border-bottom">
            <a href="/" class="d-flex align-items-center mb-3 mb-md-0 me-md-auto text-dark text-decoration-none">
                <svg class="bi me-2" width="40" height="32">
                    <use xlink:href="#bootstrap"></use>
                </svg>
                <strong class="fs-4">Wecr</strong>
            </a>
            <ul class="nav nav-pills">
                <li class="nav-item"><a href="/stats" class="nav-link">Stats</a></li>
                <li class="nav-item"><a href="/conf" class="nav-link">Config</a></li>
            </ul>
        </header>
    </div>
    <div class="container">
        <h1>Dashboard</h1>
        <div style="height: 3rem;"></div>
        <div class="container">
            <h2>Statistics</h2>
            <div id="statistics">
                <ol class="list-group list-group-numbered">
                    <li class="list-group-item d-flex justify-content-between align-items-start">
                        <div class="ms-2 me-auto">
                            <div class="fw-bold">Pages visited</div>
                        </div>
                        <span class="badge bg-primary rounded-pill" id="pages_visited">0</span>
                    </li>
                    <li class="list-group-item d-flex justify-content-between align-items-start">
                        <div class="ms-2 me-auto">
                            <div class="fw-bold">Matches found</div>
                        </div>
                        <span class="badge bg-primary rounded-pill" id="matches_found">0</span>
                    </li>
                    <li class="list-group-item d-flex justify-content-between align-items-start">
                        <div class="ms-2 me-auto">
                            <div class="fw-bold">Pages saved</div>
                        </div>
                        <span class="badge bg-primary rounded-pill" id="pages_saved">0</span>
                    </li>
                    <li class="list-group-item d-flex justify-content-between align-items-start">
                        <div class="ms-2 me-auto">
                            <div class="fw-bold">Start time</div>
                        </div>
                        <span class="badge bg-primary rounded-pill" id="start_time_unix">0</span>
                    </li>
                    <li class="list-group-item d-flex justify-content-between align-items-start">
                        <div class="ms-2 me-auto">
                            <div class="fw-bold">Stopped</div>
                        </div>
                        <span class="badge bg-primary rounded-pill" id="stopped">false</span>
                    </li>
                </ol>
            </div>
            <button class="btn btn-primary" id="btn_stop">Stop</button>
            <button class="btn btn-primary" id="btn_resume" disabled>Resume</button>
        </div>
        <div style="height: 3rem;"></div>
        <div class="container">
            <h2>Configuration</h2>
            <div>
                <b>Make runtime changes to configuration</b>
                <table class="table table-borderless">
                    <tr>
                        <th>Key</th>
                        <th>Value</th>
                    </tr>
                    <tr>
                        <th>Query</th>
                        <th>
                            <input type="text" id="conf_query">
                        </th>
                    </tr>
                    <tr>
                        <th>Is regexp</th>
                        <th>
                            <input type="text" id="conf_is_regexp">
                        </th>
                    </tr>
                </table>
                <button class="btn btn-primary" id="config_apply_button">
                    Apply
                </button>
            </div>
            <div style="height: 3rem;"></div>
            <pre id="conf_output"></pre>
        </div>
    </div>
 </body>
 <script>
    window.onload = function () {
        let confOutput = document.getElementById("conf_output");
        let pagesVisitedOut = document.getElementById("pages_visited");
        let matchesFoundOut = document.getElementById("matches_found");
        let pagesSavedOut = document.getElementById("pages_saved");
        let startTimeOut = document.getElementById("start_time_unix");
        let stoppedOut = document.getElementById("stopped");
        let applyConfButton = document.getElementById("config_apply_button");
        let confQuery = document.getElementById("conf_query");
        let confIsRegexp = document.getElementById("conf_is_regexp");
        let buttonStop = document.getElementById("btn_stop");
        let buttonResume = document.getElementById("btn_resume");
        buttonStop.addEventListener("click", (event) => {
            buttonStop.disabled = true;
            buttonResume.disabled = false;
            // stop worker pool
            let signal = {
                "stop": true,
            };
            fetch("/stop", {
                method: "POST",
                headers: {
                    "Content-type": "application/json",
                },
                body: JSON.stringify(signal),
            });
        });
        buttonResume.addEventListener("click", (event) => {
            buttonResume.disabled = true;
            buttonStop.disabled = false;
            // resume worker pool's work
            let signal = {
                "stop": false,
            };
            fetch("/stop", {
                method: "POST",
                headers: {
                    "Content-type": "application/json",
                },
                body: JSON.stringify(signal),
            });
        });
        applyConfButton.addEventListener("click", (event) => {
            let query = String(confQuery.value);
            if (confIsRegexp.value === "0") {
                isRegexp = false;
            } else if (confIsRegexp.value === "1") {
                isRegexp = true;
            };
            if (confIsRegexp.value === "false") {
                isRegexp = false;
            } else if (confIsRegexp.value === "true") {
                isRegexp = true;
            };
            let newConf = {
                "search": {
                    "is_regexp": isRegexp,
                    "query": query,
                },
            };
            fetch("/conf", {
                method: "POST",
                headers: {
                    "Content-type": "application/json",
                },
                body: JSON.stringify(newConf),
            });
        });
        const interval = setInterval(function () {
            // update statistics
            fetch("/stats")
                .then((response) => response.json())
                .then((statistics) => {
                    pagesVisitedOut.innerText = statistics.pages_visited;
                    matchesFoundOut.innerText = statistics.matches_found;
                    pagesSavedOut.innerText = statistics.pages_saved;
                    startTimeOut.innerText = new Date(1000 * statistics.start_time_unix);
                    stoppedOut.innerText = statistics.stopped;
                });
            // update config
            fetch("/conf")
                .then((response) => response.text())
                .then((config) => {
                    // "print" whole configuration
                    confOutput.innerText = config;
                    // update values in the change table if they're empty
                    let confJSON = JSON.parse(config);
                    if (confQuery.value == "") {
                        confQuery.value = confJSON.search.query;
                    }
                    if (confIsRegexp.value == "") {
                        confIsRegexp.value = confJSON.search.is_regexp;
                    }
                });
        }, 650);
    }();
 </script>
 </html>
--- a/src/dashboard/res/static/bootstrap.css
+++ b/src/dashboard/res/static/bootstrap.css
--- a/src/dashboard/res/static/bootstrap.css.map
+++ b/src/dashboard/res/static/bootstrap.css.map
--- a/src/dashboard/res/static/stylesheet.css
+++ b/src/dashboard/res/static/stylesheet.css
--- a/src/main.go
+++ b/src/main.go
@ -19,32 +19,34 @@
 package main
 import (
 	"encoding/json"
 	"flag"
 	"fmt"
 	"io"
 	"log"
 	"net/http"
 	_ "net/http/pprof"
 	"net/url"
 	"os"
 	"os/signal"
 	"path/filepath"
 	"strings"
 	"sync"
 	"time"
 	"unbewohnte/wecr/config"
 	"unbewohnte/wecr/dashboard"
 	"unbewohnte/wecr/logger"
 	"unbewohnte/wecr/queue"
 	"unbewohnte/wecr/utilities"
 	"unbewohnte/wecr/web"
 	"unbewohnte/wecr/worker"
 )
-const version = "v0.2.3"
+const version = "v0.3.5"
 const (
-	defaultConfigFile           string = "conf.json"
+	configFilename               string = "conf.json"
-	defaultOutputFile           string = "output.json"
+	prettifiedTextOutputFilename string = "extracted_data.txt"
-	defaultPrettifiedOutputFile string = "extracted_data.txt"
+	visitQueueFilename           string = "visit_queue.tmp"
 	textOutputFilename           string = "found_text.json"
 	emailsOutputFilename         string = "found_emails.json"
 )
 var (
@ -59,23 +61,17 @@ var (
 	)
 	configFile = flag.String(
-		"conf", defaultConfigFile,
+		"conf", configFilename,
 		"Configuration file name to create|look for",
 	)
 	outputFile = flag.String(
 		"out", defaultOutputFile,
 		"Output file name to output information into",
 	)
 	extractDataFilename = flag.String(
 		"extractData", "",
-		"Set filename for output JSON file and extract data from it, put each entry nicely on a new line in a new file, then exit",
+		"Specify previously outputted JSON file and extract data from it, put each entry nicely on a new line in a new file, exit afterwards",
 	)
 	workingDirectory string
 	configFilePath   string
 	outputFilePath   string
 )
 func init() {
@ -111,12 +107,12 @@ func init() {
 	if *wDir != "" {
 		workingDirectory = *wDir
 	} else {
-		exePath, err := os.Executable()
+		wdir, err := os.Getwd()
 		if err != nil {
-			logger.Error("Failed to determine executable's path: %s", err)
+			logger.Error("Failed to determine working directory path: %s", err)
 			return
 		}
-		workingDirectory = filepath.Dir(exePath)
+		workingDirectory = wdir
 	}
 	logger.Info("Working in \"%s\"", workingDirectory)
@ -124,24 +120,17 @@ func init() {
 	// extract data if needed
 	if strings.TrimSpace(*extractDataFilename) != "" {
 		logger.Info("Extracting data from %s...", *extractDataFilename)
-		err := utilities.ExtractDataFromOutput(*extractDataFilename, defaultPrettifiedOutputFile, "\n", false)
+		err := utilities.ExtractDataFromOutput(*extractDataFilename, prettifiedTextOutputFilename, "\n", false)
 		if err != nil {
 			logger.Error("Failed to extract data from %s: %s", *extractDataFilename, err)
 			os.Exit(1)
 		}
-		logger.Info("Outputted \"%s\"", defaultPrettifiedOutputFile)
+		logger.Info("Outputted \"%s\"", prettifiedTextOutputFilename)
 		os.Exit(0)
 	}
 	// global path to configuration file
 	configFilePath = filepath.Join(workingDirectory, *configFile)
 	// global path to output file
 	outputFilePath = filepath.Join(workingDirectory, *outputFile)
 	go func() {
 		http.ListenAndServe(":8000", nil)
 	}()
 }
 func main() {
@ -167,30 +156,6 @@ func main() {
 	}
 	logger.Info("Successfully opened configuration file")
 	// create logs if needed
 	if conf.Logging.OutputLogs {
 		if conf.Logging.LogsFile != "" {
 			// output logs to a file
 			logFile, err := os.Create(filepath.Join(workingDirectory, conf.Logging.LogsFile))
 			if err != nil {
 				logger.Error("Failed to create logs file: %s", err)
 				return
 			}
 			defer logFile.Close()
 			logger.Info("Outputting logs to %s", conf.Logging.LogsFile)
 			logger.SetOutput(logFile)
 		} else {
 			// output logs to stdout
 			logger.Info("Outputting logs to stdout")
 			logger.SetOutput(os.Stdout)
 		}
 	} else {
 		// no logging needed
 		logger.Info("No further logs will be outputted")
 		logger.SetOutput(nil)
 	}
 	// sanitize and correct inputs
 	if len(conf.InitialPages) == 0 {
 		logger.Error("No initial page URLs have been set")
@ -264,7 +229,7 @@ func main() {
 		logger.Warning("User agent is not set. Forced to \"%s\"", conf.Requests.UserAgent)
 	}
-	// create output directories and corresponding specialized ones
+	// create output directory and corresponding specialized ones, text output files
 	if !filepath.IsAbs(conf.Save.OutputDir) {
 		conf.Save.OutputDir = filepath.Join(workingDirectory, conf.Save.OutputDir)
 	}
@ -298,6 +263,26 @@ func main() {
 		return
 	}
 	err = os.MkdirAll(filepath.Join(conf.Save.OutputDir, config.SaveDocumentsDir), os.ModePerm)
 	if err != nil {
 		logger.Error("Failed to create output directory for documents: %s", err)
 		return
 	}
 	textOutputFile, err := os.Create(filepath.Join(conf.Save.OutputDir, textOutputFilename))
 	if err != nil {
 		logger.Error("Failed to create text output file: %s", err)
 		return
 	}
 	defer textOutputFile.Close()
 	emailsOutputFile, err := os.Create(filepath.Join(conf.Save.OutputDir, emailsOutputFilename))
 	if err != nil {
 		logger.Error("Failed to create email addresses output file: %s", err)
 		return
 	}
 	defer emailsOutputFile.Close()
 	switch conf.Search.Query {
 	case config.QueryEmail:
 		logger.Info("Looking for email addresses")
@ -307,8 +292,17 @@ func main() {
 		logger.Info("Looking for videos (%+s)", web.VideoExtentions)
 	case config.QueryAudio:
 		logger.Info("Looking for audio (%+s)", web.AudioExtentions)
 	case config.QueryDocuments:
 		logger.Info("Looking for documents (%+s)", web.DocumentExtentions)
 	case config.QueryArchive:
 		logger.Info("Archiving every visited page")
 	case config.QueryEverything:
-		logger.Info("Looking for email addresses, images, videos and audio (%+s - %+s - %+s)", web.ImageExtentions, web.VideoExtentions, web.AudioExtentions)
+		logger.Info("Looking for email addresses, images, videos, audio and various documents (%+s - %+s - %+s - %+s)",
 			web.ImageExtentions,
 			web.VideoExtentions,
 			web.AudioExtentions,
 			web.DocumentExtentions,
 		)
 	default:
 		if conf.Search.IsRegexp {
 			logger.Info("Looking for RegExp matches (%s)", conf.Search.Query)
@ -317,49 +311,97 @@ func main() {
 		}
 	}
-	// create output file
+	// create visit queue file if not turned off
-	outputFile, err := os.Create(outputFilePath)
+	var visitQueueFile *os.File = nil
 	if !conf.InMemoryVisitQueue {
 		var err error
 		visitQueueFile, err = os.Create(filepath.Join(workingDirectory, visitQueueFilename))
 		if err != nil {
-		logger.Error("Failed to create output file: %s", err)
+			logger.Error("Could not create visit queue temporary file: %s", err)
 			return
 		}
-	defer outputFile.Close()
+		defer func() {
-
+			visitQueueFile.Close()
-	// prepare channels
+			os.Remove(filepath.Join(workingDirectory, visitQueueFilename))
-	jobs := make(chan web.Job, conf.Workers*5)
+		}()
-	results := make(chan web.Result, conf.Workers*5)
+	}
 	// create initial jobs
 	initialJobs := make(chan web.Job, conf.Workers*5)
 	if !conf.InMemoryVisitQueue {
 		for _, initialPage := range conf.InitialPages {
-		jobs <- web.Job{
+			var newJob web.Job = web.Job{
 				URL:    initialPage,
 				Search: conf.Search,
 				Depth:  conf.Depth,
 			}
 			err = queue.InsertNewJob(visitQueueFile, newJob)
 			if err != nil {
 				logger.Error("Failed to encode an initial job to the visit queue: %s", err)
 				continue
 			}
 		}
 		visitQueueFile.Seek(0, io.SeekStart)
 	} else {
 		for _, initialPage := range conf.InitialPages {
 			initialJobs <- web.Job{
 				URL:    initialPage,
 				Search: conf.Search,
 				Depth:  conf.Depth,
 			}
 		}
 	}
 	// Prepare global statistics variable
 	statistics := worker.Statistics{}
 	// form a worker pool
-	workerPool := worker.NewWorkerPool(jobs, results, conf.Workers, worker.WorkerConf{
+	workerPool := worker.NewWorkerPool(initialJobs, conf.Workers, &worker.WorkerConf{
-		Requests:           conf.Requests,
+		Search:             &conf.Search,
-		Save:               conf.Save,
+		Requests:           &conf.Requests,
 		Save:               &conf.Save,
 		BlacklistedDomains: conf.BlacklistedDomains,
 		AllowedDomains:     conf.AllowedDomains,
-	})
+		VisitQueue: worker.VisitQueue{
 			VisitQueue: visitQueueFile,
 			Lock:       &sync.Mutex{},
 		},
 		EmailsOutput: emailsOutputFile,
 		TextOutput:   textOutputFile,
 	}, &statistics)
 	logger.Info("Created a worker pool with %d workers", conf.Workers)
-	// set up graceful shutdown
+	// open dashboard if needed
-	sig := make(chan os.Signal, 1)
+	var board *dashboard.Dashboard = nil
-	signal.Notify(sig, os.Interrupt)
+	if conf.Dashboard.UseDashboard {
-	go func() {
+		board = dashboard.NewDashboard(conf.Dashboard.Port, conf, workerPool)
-		<-sig
+		go board.Launch()
-		logger.Info("Received interrupt signal. Exiting...")
+		logger.Info("Launched dashboard at http://localhost:%d", conf.Dashboard.Port)
 	}
-		// stop workers
+	// create and redirect logs if needed
-		workerPool.Stop()
+	if conf.Logging.OutputLogs {
 		if conf.Logging.LogsFile != "" {
 			// output logs to a file
 			logFile, err := os.Create(filepath.Join(workingDirectory, conf.Logging.LogsFile))
 			if err != nil {
 				logger.Error("Failed to create logs file: %s", err)
 				return
 			}
 			defer logFile.Close()
-		// close results channel
+			logger.Info("Outputting logs to %s", conf.Logging.LogsFile)
-		close(results)
+			logger.SetOutput(logFile)
-	}()
+		} else {
 			// output logs to stdout
 			logger.Info("Outputting logs to stdout")
 			logger.SetOutput(os.Stdout)
 		}
 	} else {
 		// no logging needed
 		logger.Info("No further logs will be outputted")
 		logger.SetOutput(nil)
 	}
 	// launch concurrent scraping !
 	workerPool.Work()
@ -373,32 +415,25 @@ func main() {
 			for {
 				time.Sleep(time.Second)
-				timeSince := time.Since(workerPool.Stats.StartTime).Round(time.Second)
+				timeSince := time.Since(time.Unix(int64(statistics.StartTimeUnix), 0)).Round(time.Second)
 				fmt.Fprintf(os.Stdout, "\r[%s] %d pages visited; %d pages saved; %d matches (%d pages/sec)",
 					timeSince.String(),
-					workerPool.Stats.PagesVisited,
+					statistics.PagesVisited,
-					workerPool.Stats.PagesSaved,
+					statistics.PagesSaved,
-					workerPool.Stats.MatchesFound,
+					statistics.MatchesFound,
-					workerPool.Stats.PagesVisited-lastPagesVisited,
+					statistics.PagesVisited-lastPagesVisited,
 				)
-				lastPagesVisited = workerPool.Stats.PagesVisited
+				lastPagesVisited = statistics.PagesVisited
 			}
 		}()
 	}
-	// get text results and write them to the output file (files are handled by each worker separately)
+	// set up graceful shutdown
-	for {
+	sig := make(chan os.Signal, 1)
-		result, ok := <-results
+	signal.Notify(sig, os.Interrupt)
-		if !ok {
+	<-sig
-			break
+	logger.Info("Received interrupt signal. Exiting...")
 		}
-		// each entry in output file is a self-standing JSON object
+	// stop workers
-		entryBytes, err := json.MarshalIndent(result, " ", "\t")
+	workerPool.Stop()
 		if err != nil {
 			continue
 		}
 		outputFile.Write(entryBytes)
 		outputFile.Write([]byte("\n"))
 	}
 }
--- a/src/queue/visitqueue.go
+++ b/src/queue/visitqueue.go
@ -0,0 +1,54 @@
 package queue
 import (
 	"encoding/json"
 	"io"
 	"os"
 	"unbewohnte/wecr/web"
 )
 func PopLastJob(queue *os.File) (*web.Job, error) {
 	stats, err := queue.Stat()
 	if err != nil {
 		return nil, err
 	}
 	if stats.Size() == 0 {
 		return nil, nil
 	}
 	// find the last job in the queue
 	var job web.Job
 	var offset int64 = -1
 	for {
 		currentOffset, err := queue.Seek(offset, io.SeekEnd)
 		if err != nil {
 			return nil, err
 		}
 		decoder := json.NewDecoder(queue)
 		err = decoder.Decode(&job)
 		if err != nil || job.URL == "" || job.Search.Query == "" {
 			offset -= 1
 			continue
 		}
 		queue.Truncate(currentOffset)
 		return &job, nil
 	}
 }
 func InsertNewJob(queue *os.File, newJob web.Job) error {
 	_, err := queue.Seek(0, io.SeekEnd)
 	if err != nil {
 		return err
 	}
 	encoder := json.NewEncoder(queue)
 	err = encoder.Encode(&newJob)
 	if err != nil {
 		return err
 	}
 	return nil
 }
--- a/src/web/audio.go
+++ b/src/web/audio.go
@ -20,99 +20,25 @@ package web
 import (
 	"net/url"
 	"strings"
 )
 func HasAudioExtention(url string) bool {
 	for _, extention := range AudioExtentions {
 		if strings.HasSuffix(url, extention) {
 			return true
 		}
 	}
 	return false
 }
 // Tries to find audio URLs on the page
-func FindPageAudio(pageBody []byte, from *url.URL) []string {
+func FindPageAudio(pageBody []byte, from url.URL) []url.URL {
-	var urls []string
+	var urls []url.URL
 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageSrcLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasAudioExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasAudioExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasAudioExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasAudioExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// return discovered mutual video urls
 	return urls
 }
--- a/src/web/documents.go
+++ b/src/web/documents.go
@ -0,0 +1,45 @@
 /*
 	Wecr - crawl the web for data
 	Copyright (C) 2023 Kasyanov Nikolay Alexeyevich (Unbewohnte)
 	This program is free software: you can redistribute it and/or modify
 	it under the terms of the GNU Affero General Public License as published by
 	the Free Software Foundation, either version 3 of the License, or
 	(at your option) any later version.
 	This program is distributed in the hope that it will be useful,
 	but WITHOUT ANY WARRANTY; without even the implied warranty of
 	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 	GNU Affero General Public License for more details.
 	You should have received a copy of the GNU Affero General Public License
 	along with this program.  If not, see <https://www.gnu.org/licenses/>.
 */
 package web
 import (
 	"net/url"
 )
 // Tries to find docs' URLs on the page
 func FindPageDocuments(pageBody []byte, from url.URL) []url.URL {
 	var urls []url.URL
 	// for every element that has "src" attribute
 	for _, link := range FindPageSrcLinks(pageBody, from) {
 		if HasDocumentExtention(link.EscapedPath()) {
 			urls = append(urls, link)
 		}
 	}
 	// for every "a" element as well
 	for _, link := range FindPageLinks(pageBody, from) {
 		if HasDocumentExtention(link.EscapedPath()) {
 			urls = append(urls, link)
 		}
 	}
 	// return discovered doc urls
 	return urls
 }
--- a/src/web/extentions.go
+++ b/src/web/extentions.go
@ -18,6 +18,8 @@
 package web
 import "strings"
 var AudioExtentions = []string{
 	".3gp",
 	".aa",
@ -82,3 +84,91 @@ var VideoExtentions = []string{
 	".vob",
 	".ogv",
 }
 var DocumentExtentions = []string{
 	".pdf",
 	".doc",
 	".docx",
 	".epub",
 	".fb2",
 	".pub",
 	".ppt",
 	".pptx",
 	".txt",
 	".tex",
 	".odt",
 	".bib",
 	".ps",
 	".dwg",
 	".lyx",
 	".key",
 	".ott",
 	".odf",
 	".odc",
 	".ppg",
 	".xlc",
 	".latex",
 	".c",
 	".cpp",
 	".sh",
 	".go",
 	".java",
 	".cs",
 	".rs",
 	".lua",
 	".php",
 	".py",
 	".pl",
 	".lua",
 	".kt",
 	".rb",
 	".asm",
 	".rar",
 	".tar",
 	".db",
 	".7z",
 	".zip",
 	".gbr",
 	".tex",
 	".ttf",
 	".ttc",
 	".woff",
 	".otf",
 	".exif",
 }
 func HasImageExtention(urlPath string) bool {
 	for _, extention := range ImageExtentions {
 		if strings.HasSuffix(urlPath, extention) {
 			return true
 		}
 	}
 	return false
 }
 func HasDocumentExtention(urlPath string) bool {
 	for _, extention := range DocumentExtentions {
 		if strings.HasSuffix(urlPath, extention) {
 			return true
 		}
 	}
 	return false
 }
 func HasVideoExtention(urlPath string) bool {
 	for _, extention := range VideoExtentions {
 		if strings.HasSuffix(urlPath, extention) {
 			return true
 		}
 	}
 	return false
 }
 func HasAudioExtention(urlPath string) bool {
 	for _, extention := range AudioExtentions {
 		if strings.HasSuffix(urlPath, extention) {
 			return true
 		}
 	}
 	return false
 }
--- a/src/web/images.go
+++ b/src/web/images.go
@ -20,99 +20,25 @@ package web
 import (
 	"net/url"
 	"strings"
 )
 func HasImageExtention(url string) bool {
 	for _, extention := range ImageExtentions {
 		if strings.HasSuffix(url, extention) {
 			return true
 		}
 	}
 	return false
 }
 // Tries to find images' URLs on the page
-func FindPageImages(pageBody []byte, from *url.URL) []string {
+func FindPageImages(pageBody []byte, from url.URL) []url.URL {
-	var urls []string
+	var urls []url.URL
 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageSrcLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasImageExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match)
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasImageExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasImageExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasImageExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// return discovered mutual image urls from <img> and <a> tags
 	return urls
 }
--- a/src/web/job.go
+++ b/src/web/job.go
@ -22,7 +22,7 @@ import "unbewohnte/wecr/config"
 // Job to pass around workers
 type Job struct {
-	URL    string
+	URL    string        `json:"u"`
-	Search config.Search
+	Search config.Search `json:"s"`
-	Depth  uint
+	Depth  uint          `json:"d"`
 }
--- a/src/web/text.go
+++ b/src/web/text.go
@ -36,28 +36,28 @@ var tagSrcRegexp *regexp.Regexp = regexp.MustCompile(`(?i)(src)[\s]*=[\s]*("|')(
 var emailRegexp *regexp.Regexp = regexp.MustCompile(`[A-Za-z0-9._%+\-!%&?~^#$]+@[A-Za-z0-9.\-]+\.[a-zA-Z]{2,4}`)
 // var emailRegexp *regexp.Regexp = regexp.MustCompile("[a-zA-Z0-9.!#$%&'*+\\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*")
 // Fix relative link and construct an absolute one. Does nothing if the URL already looks alright
-func ResolveLink(url *url.URL, fromHost string) string {
+func ResolveLink(link url.URL, fromHost string) url.URL {
-	if !url.IsAbs() {
+	var resolvedURL url.URL = link
-		if url.Scheme == "" {
+
 	if !resolvedURL.IsAbs() {
 		if resolvedURL.Scheme == "" {
 			// add scheme
-			url.Scheme = "http"
+			resolvedURL.Scheme = "https"
 		}
-		if url.Host == "" {
+		if resolvedURL.Host == "" {
 			// add host
-			url.Host = fromHost
+			resolvedURL.Host = fromHost
 		}
 	}
-	return url.String()
+	return resolvedURL
 }
-// Find all links on page that are specified in <a> tag
+// Find all links on page that are specified in href attribute. Do not resolve links. Return URLs as they are on the page
-func FindPageLinks(pageBody []byte, from *url.URL) []string {
+func FindPageLinksDontResolve(pageBody []byte) []url.URL {
-	var urls []string
+	var urls []url.URL
 	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
 		var linkStartIndex int
@ -90,9 +90,69 @@ func FindPageLinks(pageBody []byte, from *url.URL) []string {
 			continue
 		}
-		urls = append(urls, ResolveLink(link, from.Host))
+		urls = append(urls, *link)
 	}
 	return urls
 }
 // Find all links on page that are specified in href attribute
 func FindPageLinks(pageBody []byte, from url.URL) []url.URL {
 	urls := FindPageLinksDontResolve(pageBody)
 	for index := 0; index < len(urls); index++ {
 		urls[index] = ResolveLink(urls[index], from.Host)
 	}
 	return urls
 }
 // Find all links on page that are specified in "src" attribute. Do not resolve ULRs, return them as they are on page
 func FindPageSrcLinksDontResolve(pageBody []byte) []url.URL {
 	var urls []url.URL
 	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
 		var linkStartIndex int
 		var linkEndIndex int
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		urls = append(urls, *link)
 	}
 	return urls
 }
 // Find all links on page that are specified in "src" attribute
 func FindPageSrcLinks(pageBody []byte, from url.URL) []url.URL {
 	urls := FindPageSrcLinksDontResolve(pageBody)
 	for index := 0; index < len(urls); index++ {
 		urls[index] = ResolveLink(urls[index], from.Host)
 	}
 	return urls
 }
--- a/src/web/videos.go
+++ b/src/web/videos.go
@ -20,99 +20,25 @@ package web
 import (
 	"net/url"
 	"strings"
 )
 func HasVideoExtention(url string) bool {
 	for _, extention := range VideoExtentions {
 		if strings.HasSuffix(url, extention) {
 			return true
 		}
 	}
 	return false
 }
 // Tries to find videos' URLs on the page
-func FindPageVideos(pageBody []byte, from *url.URL) []string {
+func FindPageVideos(pageBody []byte, from url.URL) []url.URL {
-	var urls []string
+	var urls []url.URL
 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageSrcLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasVideoExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasVideoExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
+	for _, link := range FindPageLinks(pageBody, from) {
-		var linkStartIndex int
+		if HasVideoExtention(link.EscapedPath()) {
-		var linkEndIndex int
+			urls = append(urls, link)
 		linkStartIndex = strings.Index(match, "\"")
 		if linkStartIndex == -1 {
 			linkStartIndex = strings.Index(match, "'")
 			if linkStartIndex == -1 {
 				continue
 			}
 			linkEndIndex = strings.LastIndex(match, "'")
 			if linkEndIndex == -1 {
 				continue
 			}
 		} else {
 			linkEndIndex = strings.LastIndex(match, "\"")
 			if linkEndIndex == -1 {
 				continue
 			}
 		}
 		if linkEndIndex <= linkStartIndex+1 {
 			continue
 		}
 		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
 		if err != nil {
 			continue
 		}
 		linkResolved := ResolveLink(link, from.Host)
 		if HasVideoExtention(linkResolved) {
 			urls = append(urls, linkResolved)
 		}
 	}
 	// return discovered mutual video urls
 	return urls
 }
--- a/src/worker/pool.go
+++ b/src/worker/pool.go
@ -32,10 +32,11 @@ type visited struct {
 // Whole worker pool's statistics
 type Statistics struct {
-	PagesVisited uint64
+	PagesVisited  uint64 `json:"pages_visited"`
-	MatchesFound uint64
+	MatchesFound  uint64 `json:"matches_found"`
-	PagesSaved   uint64
+	PagesSaved    uint64 `json:"pages_saved"`
-	StartTime    time.Time
+	StartTimeUnix uint64 `json:"start_time_unix"`
 	Stopped       bool   `json:"stopped"`
 }
 // Web-Worker pool
@ -43,11 +44,11 @@ type Pool struct {
 	workersCount uint
 	workers      []*Worker
 	visited      visited
-	Stats        Statistics
+	Stats        *Statistics
 }
 // Create a new worker pool
-func NewWorkerPool(jobs chan web.Job, results chan web.Result, workerCount uint, workerConf WorkerConf) *Pool {
+func NewWorkerPool(initialJobs chan web.Job, workerCount uint, workerConf *WorkerConf, stats *Statistics) *Pool {
 	var newPool Pool = Pool{
 		workersCount: workerCount,
 		workers:      nil,
@ -55,16 +56,12 @@ func NewWorkerPool(jobs chan web.Job, results chan web.Result, workerCount uint,
 			URLs: nil,
 			Lock: sync.Mutex{},
 		},
-		Stats: Statistics{
+		Stats: stats,
 			StartTime:    time.Time{},
 			PagesVisited: 0,
 			MatchesFound: 0,
 		},
 	}
 	var i uint
 	for i = 0; i < workerCount; i++ {
-		newWorker := NewWorker(jobs, results, workerConf, &newPool.visited, &newPool.Stats)
+		newWorker := NewWorker(initialJobs, workerConf, &newPool.visited, newPool.Stats)
 		newPool.workers = append(newPool.workers, &newWorker)
 	}
@ -73,7 +70,8 @@ func NewWorkerPool(jobs chan web.Job, results chan web.Result, workerCount uint,
 // Notify all workers in pool to start scraping
 func (p *Pool) Work() {
-	p.Stats.StartTime = time.Now()
+	p.Stats.StartTimeUnix = uint64(time.Now().Unix())
 	p.Stats.Stopped = false
 	for _, worker := range p.workers {
 		worker.Stopped = false
@ -83,6 +81,7 @@ func (p *Pool) Work() {
 // Notify all workers in pool to stop scraping
 func (p *Pool) Stop() {
 	p.Stats.Stopped = true
 	for _, worker := range p.workers {
 		worker.Stopped = true
 	}
--- a/src/worker/worker.go
+++ b/src/worker/worker.go
@ -19,41 +19,54 @@
 package worker
 import (
 	"bytes"
 	"encoding/json"
 	"fmt"
 	"io"
 	"net/url"
 	"os"
 	"path"
 	"path/filepath"
 	"regexp"
 	"strings"
 	"sync"
 	"time"
 	"unbewohnte/wecr/config"
 	"unbewohnte/wecr/logger"
 	"unbewohnte/wecr/queue"
 	"unbewohnte/wecr/web"
 )
 type VisitQueue struct {
 	VisitQueue *os.File
 	Lock       *sync.Mutex
 }
 // Worker configuration
 type WorkerConf struct {
-	Requests           config.Requests
+	Search             *config.Search
-	Save               config.Save
+	Requests           *config.Requests
 	Save               *config.Save
 	BlacklistedDomains []string
 	AllowedDomains     []string
 	VisitQueue         VisitQueue
 	TextOutput         io.Writer
 	EmailsOutput       io.Writer
 }
 // Web worker
 type Worker struct {
 	Jobs    chan web.Job
-	Results chan web.Result
+	Conf    *WorkerConf
 	Conf    WorkerConf
 	visited *visited
 	stats   *Statistics
 	Stopped bool
 }
 // Create a new worker
-func NewWorker(jobs chan web.Job, results chan web.Result, conf WorkerConf, visited *visited, stats *Statistics) Worker {
+func NewWorker(jobs chan web.Job, conf *WorkerConf, visited *visited, stats *Statistics) Worker {
 	return Worker{
 		Jobs:    jobs,
 		Results: results,
 		Conf:    conf,
 		visited: visited,
 		stats:   stats,
@ -61,8 +74,8 @@ func NewWorker(jobs chan web.Job, results chan web.Result, conf WorkerConf, visi
 	}
 }
-func (w *Worker) saveContent(links []string, pageURL *url.URL) {
+func (w *Worker) saveContent(links []url.URL, pageURL *url.URL) {
-	var alreadyProcessedUrls []string
+	var alreadyProcessedUrls []url.URL
 	for count, link := range links {
 		// check if this URL has been processed already
 		var skip bool = false
@ -80,27 +93,29 @@ func (w *Worker) saveContent(links []string, pageURL *url.URL) {
 		}
 		alreadyProcessedUrls = append(alreadyProcessedUrls, link)
-		var fileName string = fmt.Sprintf("%s_%d_%s", pageURL.Host, count, path.Base(link))
+		var fileName string = fmt.Sprintf("%s_%d_%s", pageURL.Host, count, path.Base(link.Path))
 		var filePath string
-		if web.HasImageExtention(link) {
+		if web.HasImageExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveImagesDir, fileName)
-		} else if web.HasVideoExtention(link) {
+		} else if web.HasVideoExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveVideosDir, fileName)
-		} else if web.HasAudioExtention(link) {
+		} else if web.HasAudioExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveAudioDir, fileName)
 		} else if web.HasDocumentExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveDocumentsDir, fileName)
 		} else {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, fileName)
 		}
 		err := web.FetchFile(
-			link,
+			link.String(),
 			w.Conf.Requests.UserAgent,
 			w.Conf.Requests.ContentFetchTimeoutMs,
 			filePath,
 		)
 		if err != nil {
-			logger.Error("Failed to fetch file at %s: %s", link, err)
+			logger.Error("Failed to fetch file located at %s: %s", link.String(), err)
 			return
 		}
@ -109,22 +124,115 @@ func (w *Worker) saveContent(links []string, pageURL *url.URL) {
 	}
 }
-// Save page to the disk with a corresponding name
+// Save page to the disk with a corresponding name; Download any src files, stylesheets and JS along the way
-func (w *Worker) savePage(baseURL *url.URL, pageData []byte) {
+func (w *Worker) savePage(baseURL url.URL, pageData []byte) {
-	if w.Conf.Save.SavePages && w.Conf.Save.OutputDir != "" {
+	var findPageFileContentURLs func([]byte) []url.URL = func(pageBody []byte) []url.URL {
-		var pageName string = fmt.Sprintf("%s_%s.html", baseURL.Host, path.Base(baseURL.String()))
+		var urls []url.URL
-		pageFile, err := os.Create(filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir, pageName))
+
 		for _, link := range web.FindPageLinksDontResolve(pageData) {
 			if strings.Contains(link.Path, ".css") ||
 				strings.Contains(link.Path, ".scss") ||
 				strings.Contains(link.Path, ".js") ||
 				strings.Contains(link.Path, ".mjs") {
 				urls = append(urls, link)
 			}
 		}
 		urls = append(urls, web.FindPageSrcLinksDontResolve(pageBody)...)
 		return urls
 	}
 	var cleanLink func(url.URL, url.URL) url.URL = func(link url.URL, from url.URL) url.URL {
 		resolvedLink := web.ResolveLink(link, from.Host)
 		cleanLink, err := url.Parse(resolvedLink.Scheme + "://" + resolvedLink.Host + resolvedLink.Path)
 		if err != nil {
 			return resolvedLink
 		}
 		return *cleanLink
 	}
 	// Create directory with all file content on the page
 	var pageFilesDirectoryName string = fmt.Sprintf(
 		"%s_%s_files",
 		baseURL.Host,
 		strings.ReplaceAll(baseURL.Path, "/", "_"),
 	)
 	err := os.MkdirAll(filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir, pageFilesDirectoryName), os.ModePerm)
 	if err != nil {
-			logger.Error("Failed to create page of \"%s\": %s", baseURL.String(), err)
+		logger.Error("Failed to create directory to store file contents of %s: %s", baseURL.String(), err)
 		return
 	}
 		defer pageFile.Close()
-		pageFile.Write(pageData)
+	// Save files on page
 	srcLinks := findPageFileContentURLs(pageData)
 	for _, srcLink := range srcLinks {
 		web.FetchFile(srcLink.String(),
 			w.Conf.Requests.UserAgent,
 			w.Conf.Requests.ContentFetchTimeoutMs,
 			filepath.Join(
 				w.Conf.Save.OutputDir,
 				config.SavePagesDir,
 				pageFilesDirectoryName,
 				path.Base(srcLink.String()),
 			),
 		)
 	}
 	// Redirect old content URLs to local files
 	for _, srcLink := range srcLinks {
 		cleanLink := cleanLink(srcLink, baseURL)
 		pageData = bytes.ReplaceAll(
 			pageData,
 			[]byte(srcLink.String()),
 			[]byte("./"+filepath.Join(pageFilesDirectoryName, path.Base(cleanLink.String()))),
 		)
 	}
 	// Create page output file
 	pageName := fmt.Sprintf(
 		"%s_%s.html",
 		baseURL.Host,
 		strings.ReplaceAll(baseURL.Path, "/", "_"),
 	)
 	outfile, err := os.Create(filepath.Join(
 		filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir),
 		pageName,
 	))
 	if err != nil {
 		fmt.Printf("Failed to create output file: %s\n", err)
 	}
 	defer outfile.Close()
 	outfile.Write(pageData)
 	logger.Info("Saved \"%s\"", pageName)
 	w.stats.PagesSaved++
 }
 const (
 	textTypeMatch = iota
 	textTypeEmail = iota
 )
 // Save text result to an appropriate file
 func (w *Worker) saveResult(result web.Result, textType int) {
 	// write result to the output file
 	var output io.Writer
 	switch textType {
 	case textTypeEmail:
 		output = w.Conf.EmailsOutput
 	default:
 		output = w.Conf.TextOutput
 	}
 	// each entry in output file is a self-standing JSON object
 	entryBytes, err := json.MarshalIndent(result, " ", "\t")
 	if err != nil {
 		return
 	}
 	output.Write(entryBytes)
 	output.Write([]byte("\n"))
 }
 // Launch scraping process on this worker
@ -133,7 +241,27 @@ func (w *Worker) Work() {
 		return
 	}
-	for job := range w.Jobs {
+	for {
 		var job web.Job
 		if w.Conf.VisitQueue.VisitQueue != nil {
 			w.Conf.VisitQueue.Lock.Lock()
 			newJob, err := queue.PopLastJob(w.Conf.VisitQueue.VisitQueue)
 			if err != nil {
 				logger.Error("Failed to get a new job from visit queue: %s", err)
 				w.Conf.VisitQueue.Lock.Unlock()
 				continue
 			}
 			if newJob == nil {
 				w.Conf.VisitQueue.Lock.Unlock()
 				continue
 			}
 			job = *newJob
 			w.Conf.VisitQueue.Lock.Unlock()
 		} else {
 			job = <-w.Jobs
 		}
 		// check if the worker has been stopped
 		if w.Stopped {
 			// stop working
@ -205,23 +333,43 @@ func (w *Worker) Work() {
 		}
 		// find links
-		pageLinks := web.FindPageLinks(pageData, pageURL)
+		pageLinks := web.FindPageLinks(pageData, *pageURL)
 		go func() {
 			if job.Depth > 1 {
-				// decrement depth and add new jobs to the channel
+				// decrement depth and add new jobs
 				job.Depth--
 				if w.Conf.VisitQueue.VisitQueue != nil {
 					// add to the visit queue
 					w.Conf.VisitQueue.Lock.Lock()
 					for _, link := range pageLinks {
 						if link.String() != job.URL {
 							err = queue.InsertNewJob(w.Conf.VisitQueue.VisitQueue, web.Job{
 								URL:    link.String(),
 								Search: *w.Conf.Search,
 								Depth:  job.Depth,
 							})
 							if err != nil {
 								logger.Error("Failed to encode a new job to a visit queue: %s", err)
 								continue
 							}
 						}
 					}
 					w.Conf.VisitQueue.Lock.Unlock()
 				} else {
 					//  add to the in-memory channel
 					for _, link := range pageLinks {
-					if link != job.URL {
+						if link.String() != job.URL {
 							w.Jobs <- web.Job{
-							URL:    link,
+								URL:    link.String(),
-							Search: job.Search,
+								Search: *w.Conf.Search,
 								Depth:  job.Depth,
 							}
 						}
 					}
 				}
 			}
 			pageLinks = nil
 		}()
@ -229,29 +377,41 @@ func (w *Worker) Work() {
 		var savePage bool = false
 		switch job.Search.Query {
 		case config.QueryArchive:
 			savePage = true
 		case config.QueryImages:
 			// find image URLs, output images to the file while not saving already outputted ones
-			imageLinks := web.FindPageImages(pageData, pageURL)
+			imageLinks := web.FindPageImages(pageData, *pageURL)
 			w.saveContent(imageLinks, pageURL)
 			if len(imageLinks) > 0 {
 				w.saveContent(imageLinks, pageURL)
 				savePage = true
 			}
 		case config.QueryVideos:
 			// search for videos
 			// find video URLs, output videos to the files while not saving already outputted ones
-			videoLinks := web.FindPageVideos(pageData, pageURL)
+			videoLinks := web.FindPageVideos(pageData, *pageURL)
 			w.saveContent(videoLinks, pageURL)
 			if len(videoLinks) > 0 {
 				w.saveContent(videoLinks, pageURL)
 				savePage = true
 			}
 		case config.QueryAudio:
 			// search for audio
 			// find audio URLs, output audio to the file while not saving already outputted ones
-			audioLinks := web.FindPageAudio(pageData, pageURL)
+			audioLinks := web.FindPageAudio(pageData, *pageURL)
 			w.saveContent(audioLinks, pageURL)
 			if len(audioLinks) > 0 {
 				w.saveContent(audioLinks, pageURL)
 				savePage = true
 			}
 		case config.QueryDocuments:
 			// search for various documents
 			// find documents URLs, output docs to the file while not saving already outputted ones
 			docsLinks := web.FindPageDocuments(pageData, *pageURL)
 			if len(docsLinks) > 0 {
 				w.saveContent(docsLinks, pageURL)
 				savePage = true
 			}
@ -259,11 +419,11 @@ func (w *Worker) Work() {
 			// search for email
 			emailAddresses := web.FindPageEmailsWithCheck(pageData)
 			if len(emailAddresses) > 0 {
-				w.Results <- web.Result{
+				w.saveResult(web.Result{
 					PageURL: job.URL,
 					Search:  job.Search,
 					Data:    emailAddresses,
-				}
+				}, textTypeEmail)
 				w.stats.MatchesFound += uint64(len(emailAddresses))
 				savePage = true
 			}
@ -272,28 +432,29 @@ func (w *Worker) Work() {
 			// search for everything
 			// files
-			var contentLinks []string
+			var contentLinks []url.URL
-			contentLinks = append(contentLinks, web.FindPageImages(pageData, pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageImages(pageData, *pageURL)...)
-			contentLinks = append(contentLinks, web.FindPageAudio(pageData, pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageAudio(pageData, *pageURL)...)
-			contentLinks = append(contentLinks, web.FindPageVideos(pageData, pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageVideos(pageData, *pageURL)...)
 			contentLinks = append(contentLinks, web.FindPageDocuments(pageData, *pageURL)...)
 			w.saveContent(contentLinks, pageURL)
 			if len(contentLinks) > 0 {
 				savePage = true
 			}
 			// email
 			emailAddresses := web.FindPageEmailsWithCheck(pageData)
 			if len(emailAddresses) > 0 {
-				w.Results <- web.Result{
+				w.saveResult(web.Result{
 					PageURL: job.URL,
 					Search:  job.Search,
 					Data:    emailAddresses,
-				}
+				}, textTypeEmail)
 				w.stats.MatchesFound += uint64(len(emailAddresses))
 				savePage = true
 			}
 			if len(contentLinks) > 0 || len(emailAddresses) > 0 {
 				savePage = true
 			}
 		default:
 			// text search
 			switch job.Search.IsRegexp {
@ -307,35 +468,33 @@ func (w *Worker) Work() {
 				matches := web.FindPageRegexp(re, pageData)
 				if len(matches) > 0 {
-					w.Results <- web.Result{
+					w.saveResult(web.Result{
 						PageURL: job.URL,
 						Search:  job.Search,
 						Data:    matches,
-					}
+					}, textTypeMatch)
 					logger.Info("Found matches: %+v", matches)
 					w.stats.MatchesFound += uint64(len(matches))
 					savePage = true
 				}
 			case false:
 				// just text
 				if web.IsTextOnPage(job.Search.Query, true, pageData) {
-					w.Results <- web.Result{
+					w.saveResult(web.Result{
 						PageURL: job.URL,
 						Search:  job.Search,
-						Data:    nil,
+						Data:    []string{job.Search.Query},
-					}
+					}, textTypeMatch)
 					logger.Info("Found \"%s\" on page", job.Search.Query)
 					w.stats.MatchesFound++
 					savePage = true
 				}
 			}
 		}
 		// save page
-		if savePage {
+		if savePage && w.Conf.Save.SavePages {
-			w.savePage(pageURL, pageData)
+			w.savePage(*pageURL, pageData)
 		}
 		pageData = nil
 		pageURL = nil
Author	SHA1	Message	Date
Kasianov Nikolai Alekseevich	722f3fb536	Fixed emails being saved to a wrong file under query=everything; improved page saving process; fixed pages being saved not considering the actual setting; Added non-link resolving variation of FindPageLinks; Added query=archive functionality; working directory is now an actual working directory instead an executables directory	2 years ago
Kasianov Nikolai Alekseevich	c91986d42d	Dashboard: Stop\|Resume worker pools work at runtime	2 years ago
Kasianov Nikolai Alekseevich	c2ec2073dc	Removed unnecessary results channel	2 years ago
Kasianov Nikolai Alekseevich	812fd2adf7	Moved up until now separate text saving code to the worker package where it should be	2 years ago
Kasianov Nikolai Alekseevich	b256d8a83e	No more unified text output file. Text searches of different kinds go into their own files	2 years ago
Kasianov Nikolai Alekseevich	e5af2939cc	Web dashboard: ability to change runtime query	2 years ago
Kasianov Nikolai Alekseevich	6fab9031b1	Web dashboard foundation	2 years ago
Kasianov Nikolai Alekseevich	fd484c665e	Documents search	2 years ago
Kasianov Nikolai Alekseevich	00bc33d5de	Updated README	2 years ago
Kasianov Nikolai Alekseevich	f96bad448a	Removed debug logging from visit queue functions	2 years ago
Kasianov Nikolai Alekseevich	d877a483a2	In-file memory queue! No more insane RAM consumption	2 years ago
Kasianov Nikolai Alekseevich	023c2e5a19	Removed forgotten pprof stuf	2 years ago
Kasianov Nikolai Alekseevich	1771d19b82	Added new information to README	2 years ago