Fixed emails being saved to a wrong file under query=everything; improved page saving process; fixed pages being saved not considering the actual setting; Added non-link resolving variation of FindPageLinks; Added query=archive functionality; working directory is now an actual working directory instead an executables directory

Dashboard: Stop|Resume worker pools work at runtime
Removed unnecessary results channel
13 changed files with 470 additions and 484 deletions
--- a/README.md
+++ b/README.md
@ -4,9 +4,9 @@

 A simple HTML web spider with no dependencies. It is possible to search for pages with a text on them or for the text itself, extract images, video, audio and save pages that satisfy the criteria along the way. 

-## Configuration
+## Configuration Overview

-The flow of work fully depends on the configuration file. By default `conf.json` is used as a configuration file, but the name can be changed via `-conf` flag. The default configuration is embedded in the program so on the first launch or by simply deleting the file, a new `conf.json` will be created in the same directory as the executable itself unless the `-wdir` (working directory) flag is set to some other value. To see al available flags run `wecr -h`.
+The flow of work fully depends on the configuration file. By default `conf.json` is used as a configuration file, but the name can be changed via `-conf` flag. The default configuration is embedded in the program so on the first launch or by simply deleting the file, a new `conf.json` will be created in the working directory unless the `-wdir` (working directory) flag is set to some other value, in which case it has a bigger importance. To see all available flags run `wecr -h`.

 The configuration is split into different branches like `requests` (how requests are made, ie: request timeout, wait time, user agent), `logging` (use logs, output to a file), `save` (output file|directory, save pages or not) or `search` (use regexp, query string) each of which contain tweakable parameters. There are global ones as well such as `workers` (working threads that make requests in parallel) and `depth` (literally, how deep the recursive search should go). The names are simple and self-explanatory so no attribute-by-attribute explanation needed for most of them.

@ -18,7 +18,7 @@ You can change search `query` at **runtime** via web dashboard if `launch_dashbo

 ### Search query

-There are some special `query` values:
+There are some special `query` values to control the flow of work:

 - `email` - tells wecr to scrape email addresses and output to `output_file`
 - `images` - find all images on pages and output to the corresponding directory in `output_dir` (**IMPORTANT**: set `content_fetch_timeout_ms` to `0` so the images (and other content below) load fully)
@ -26,12 +26,13 @@ There are some special `query` values:
 - `audio` - find and fetch files that look like audio
 - `documents` - find and fetch files that look like a document
 - `everything` - find and fetch images, audio, video, documents and email addresses
+- `archive` - no text to be searched, save every visited page

-When `is_regexp` is enabled, the `query` is treated as a regexp string and pages will be scanned for matches that satisfy it.
+When `is_regexp` is enabled, the `query` is treated as a regexp string (in Go "flavor") and pages will be scanned for matches that satisfy it.

-### Output
+### Data Output

-By default, if the query is not something of special values all the matches and other data will be outputted to `output.json` file as separate continuous JSON objects, but if `save_pages` is set to `true` and|or `query` is set to `images`, `videos`, `audio`, etc. - the additional contents will be put in the corresponding directories inside `output_dir`, which is neatly created by the executable's side.
+If the query is not something of special value, all text matches will be outputted to `found_text.json` file as separate continuous JSON objects in `output_dir`; if `save_pages` is set to `true` and|or `query` is set to `images`, `videos`, `audio`, etc. - the additional contents will be also put in the corresponding directories inside `output_dir`, which is neatly created in the working directory or, if `-wdir` flag is set - there. If `output_dir` is happened to be empty - contents will be outputted directly to the working directory.

 The output almost certainly contains some duplicates and is not easy to work with programmatically, so you can use `-extractData` with the output JSON file argument (like `found_text.json`, which is the default output file name for simple text searches) to extract the actual data, filter out the duplicates and put each entry on its new line in a new text file. 

@ -43,7 +44,7 @@ Otherwise - `go build` in the `src` directory to build `wecr`. No dependencies.

 ## Examples

-See [page on my website](https://unbewohnte.su/wecr) for some basic examples.
+See [a page on my website](https://unbewohnte.su/wecr) for some basic examples.

 Dump of a basic configuration:

@ -87,4 +88,4 @@ Dump of a basic configuration:
 ```

 ## License
-AGPLv3
+wecr is distributed under AGPLv3 license
--- a/src/config/config.go
+++ b/src/config/config.go
@ -31,6 +31,7 @@ const (
 	QueryEmail      string = "email"
 	QueryDocuments  string = "documents"
 	QueryEverything string = "everything"
+	QueryArchive    string = "archive"
 )

 const (
--- a/src/dashboard/dashboard.go
+++ b/src/dashboard/dashboard.go
@ -1,3 +1,21 @@
+/*
+	Wecr - crawl the web for data
+	Copyright (C) 2023 Kasyanov Nikolay Alexeyevich (Unbewohnte)
+
+	This program is free software: you can redistribute it and/or modify
+	it under the terms of the GNU Affero General Public License as published by
+	the Free Software Foundation, either version 3 of the License, or
+	(at your option) any later version.
+
+	This program is distributed in the hope that it will be useful,
+	but WITHOUT ANY WARRANTY; without even the implied warranty of
+	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+	GNU Affero General Public License for more details.
+
+	You should have received a copy of the GNU Affero General Public License
+	along with this program.  If not, see <https://www.gnu.org/licenses/>.
+*/
+
 package dashboard

 import (
@ -25,7 +43,11 @@ type PageData struct {
 	Stats worker.Statistics
 }

-func NewDashboard(port uint16, webConf *config.Conf, statistics *worker.Statistics) *Dashboard {
+type PoolStop struct {
+	Stop bool `json:"stop"`
+}
+
+func NewDashboard(port uint16, webConf *config.Conf, pool *worker.Pool) *Dashboard {
 	mux := http.NewServeMux()
 	res, err := fs.Sub(resFS, "res")
 	if err != nil {
@ -34,6 +56,7 @@ func NewDashboard(port uint16, webConf *config.Conf, statistics *worker.Statisti
 	}

 	mux.Handle("/static/", http.FileServer(http.FS(res)))
+
 	mux.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
 		template, err := template.ParseFS(res, "*.html")
 		if err != nil {
@ -44,8 +67,37 @@ func NewDashboard(port uint16, webConf *config.Conf, statistics *worker.Statisti
 		template.ExecuteTemplate(w, "index.html", nil)
 	})

+	mux.HandleFunc("/stop", func(w http.ResponseWriter, req *http.Request) {
+		var stop PoolStop
+
+		requestBody, err := io.ReadAll(req.Body)
+		if err != nil {
+			http.Error(w, "Failed to read request body", http.StatusInternalServerError)
+			logger.Error("Failed to read stop|resume signal from dashboard request: %s", err)
+			return
+		}
+		defer req.Body.Close()
+
+		err = json.Unmarshal(requestBody, &stop)
+		if err != nil {
+			http.Error(w, "Failed to unmarshal stop|resume signal", http.StatusInternalServerError)
+			logger.Error("Failed to unmarshal stop|resume signal from dashboard UI: %s", err)
+			return
+		}
+
+		if stop.Stop {
+			// stop worker pool
+			pool.Stop()
+			logger.Info("Stopped worker pool via request from dashboard")
+		} else {
+			// resume work
+			pool.Work()
+			logger.Info("Resumed work via request from dashboard")
+		}
+	})
+
 	mux.HandleFunc("/stats", func(w http.ResponseWriter, req *http.Request) {
-		jsonStats, err := json.MarshalIndent(statistics, "", " ")
+		jsonStats, err := json.MarshalIndent(pool.Stats, "", " ")
 		if err != nil {
 			http.Error(w, "Failed to marshal statistics", http.StatusInternalServerError)
 			logger.Error("Failed to marshal stats to send to the dashboard: %s", err)
--- a/src/dashboard/res/index.html
+++ b/src/dashboard/res/index.html
@ -68,6 +68,9 @@
                    </li>
                </ol>
            </div>
+
+            <button class="btn btn-primary" id="btn_stop">Stop</button>
+            <button class="btn btn-primary" id="btn_resume" disabled>Resume</button>
        </div>

        <div style="height: 3rem;"></div>
@ -117,6 +120,44 @@
        let applyConfButton = document.getElementById("config_apply_button");
        let confQuery = document.getElementById("conf_query");
        let confIsRegexp = document.getElementById("conf_is_regexp");
+        let buttonStop = document.getElementById("btn_stop");
+        let buttonResume = document.getElementById("btn_resume");
+
+        buttonStop.addEventListener("click", (event) => {
+            buttonStop.disabled = true;
+            buttonResume.disabled = false;
+
+            // stop worker pool
+            let signal = {
+                "stop": true,
+            };
+
+            fetch("/stop", {
+                method: "POST",
+                headers: {
+                    "Content-type": "application/json",
+                },
+                body: JSON.stringify(signal),
+            });
+        });
+
+        buttonResume.addEventListener("click", (event) => {
+            buttonResume.disabled = true;
+            buttonStop.disabled = false;
+
+            // resume worker pool's work
+            let signal = {
+                "stop": false,
+            };
+
+            fetch("/stop", {
+                method: "POST",
+                headers: {
+                    "Content-type": "application/json",
+                },
+                body: JSON.stringify(signal),
+            });
+        });

        applyConfButton.addEventListener("click", (event) => {
            let query = String(confQuery.value);
@ -139,8 +180,6 @@
                },
            };

-            console.log(newConf);
-
            fetch("/conf", {
                method: "POST",
                headers: {
--- a/src/main.go
+++ b/src/main.go
@ -19,7 +19,6 @@
 package main

 import (
-	"encoding/json"
 	"flag"
 	"fmt"
 	"io"
@ -40,7 +39,7 @@ import (
 	"unbewohnte/wecr/worker"
 )

-const version = "v0.3.2"
+const version = "v0.3.5"

 const (
 	configFilename               string = "conf.json"
@ -68,7 +67,7 @@ var (

 	extractDataFilename = flag.String(
 		"extractData", "",
-		"Set filename for output JSON file and extract data from it, put each entry nicely on a new line in a new file, then exit",
+		"Specify previously outputted JSON file and extract data from it, put each entry nicely on a new line in a new file, exit afterwards",
 	)

 	workingDirectory string
@ -108,12 +107,12 @@ func init() {
 	if *wDir != "" {
 		workingDirectory = *wDir
 	} else {
-		exePath, err := os.Executable()
+		wdir, err := os.Getwd()
 		if err != nil {
-			logger.Error("Failed to determine executable's path: %s", err)
+			logger.Error("Failed to determine working directory path: %s", err)
 			return
 		}
-		workingDirectory = filepath.Dir(exePath)
+		workingDirectory = wdir
 	}

 	logger.Info("Working in \"%s\"", workingDirectory)
@ -157,17 +156,6 @@ func main() {
 	}
 	logger.Info("Successfully opened configuration file")

-	// Prepare global statistics variable
-	statistics := worker.Statistics{}
-
-	// open dashboard if needed
-	var board *dashboard.Dashboard = nil
-	if conf.Dashboard.UseDashboard {
-		board = dashboard.NewDashboard(conf.Dashboard.Port, conf, &statistics)
-		go board.Launch()
-		logger.Info("Launched dashboard at http://localhost:%d", conf.Dashboard.Port)
-	}
-
 	// sanitize and correct inputs
 	if len(conf.InitialPages) == 0 {
 		logger.Error("No initial page URLs have been set")
@ -306,6 +294,8 @@ func main() {
 		logger.Info("Looking for audio (%+s)", web.AudioExtentions)
 	case config.QueryDocuments:
 		logger.Info("Looking for documents (%+s)", web.DocumentExtentions)
+	case config.QueryArchive:
+		logger.Info("Archiving every visited page")
 	case config.QueryEverything:
 		logger.Info("Looking for email addresses, images, videos, audio and various documents (%+s - %+s - %+s - %+s)",
 			web.ImageExtentions,
@ -321,33 +311,6 @@ func main() {
 		}
 	}

-	// create logs if needed
-	if conf.Logging.OutputLogs {
-		if conf.Logging.LogsFile != "" {
-			// output logs to a file
-			logFile, err := os.Create(filepath.Join(workingDirectory, conf.Logging.LogsFile))
-			if err != nil {
-				logger.Error("Failed to create logs file: %s", err)
-				return
-			}
-			defer logFile.Close()
-
-			logger.Info("Outputting logs to %s", conf.Logging.LogsFile)
-			logger.SetOutput(logFile)
-		} else {
-			// output logs to stdout
-			logger.Info("Outputting logs to stdout")
-			logger.SetOutput(os.Stdout)
-		}
-	} else {
-		// no logging needed
-		logger.Info("No further logs will be outputted")
-		logger.SetOutput(nil)
-	}
-
-	jobs := make(chan web.Job, conf.Workers*5)
-	results := make(chan web.Result, conf.Workers*5)
-
 	// create visit queue file if not turned off
 	var visitQueueFile *os.File = nil
 	if !conf.InMemoryVisitQueue {
@ -364,6 +327,7 @@ func main() {
 	}

 	// create initial jobs
+	initialJobs := make(chan web.Job, conf.Workers*5)
 	if !conf.InMemoryVisitQueue {
 		for _, initialPage := range conf.InitialPages {
 			var newJob web.Job = web.Job{
@ -380,7 +344,7 @@ func main() {
 		visitQueueFile.Seek(0, io.SeekStart)
 	} else {
 		for _, initialPage := range conf.InitialPages {
-			jobs <- web.Job{
+			initialJobs <- web.Job{
 				URL:    initialPage,
 				Search: conf.Search,
 				Depth:  conf.Depth,
@ -388,8 +352,11 @@ func main() {
 		}
 	}

+	// Prepare global statistics variable
+	statistics := worker.Statistics{}
+
 	// form a worker pool
-	workerPool := worker.NewWorkerPool(jobs, results, conf.Workers, &worker.WorkerConf{
+	workerPool := worker.NewWorkerPool(initialJobs, conf.Workers, &worker.WorkerConf{
 		Search:             &conf.Search,
 		Requests:           &conf.Requests,
 		Save:               &conf.Save,
@ -399,22 +366,42 @@ func main() {
 			VisitQueue: visitQueueFile,
 			Lock:       &sync.Mutex{},
 		},
+		EmailsOutput: emailsOutputFile,
+		TextOutput:   textOutputFile,
 	}, &statistics)
 	logger.Info("Created a worker pool with %d workers", conf.Workers)

-	// set up graceful shutdown
-	sig := make(chan os.Signal, 1)
-	signal.Notify(sig, os.Interrupt)
-	go func() {
-		<-sig
-		logger.Info("Received interrupt signal. Exiting...")
+	// open dashboard if needed
+	var board *dashboard.Dashboard = nil
+	if conf.Dashboard.UseDashboard {
+		board = dashboard.NewDashboard(conf.Dashboard.Port, conf, workerPool)
+		go board.Launch()
+		logger.Info("Launched dashboard at http://localhost:%d", conf.Dashboard.Port)
+	}

-		// stop workers
-		workerPool.Stop()
+	// create and redirect logs if needed
+	if conf.Logging.OutputLogs {
+		if conf.Logging.LogsFile != "" {
+			// output logs to a file
+			logFile, err := os.Create(filepath.Join(workingDirectory, conf.Logging.LogsFile))
+			if err != nil {
+				logger.Error("Failed to create logs file: %s", err)
+				return
+			}
+			defer logFile.Close()

-		// close results channel
-		close(results)
-	}()
+			logger.Info("Outputting logs to %s", conf.Logging.LogsFile)
+			logger.SetOutput(logFile)
+		} else {
+			// output logs to stdout
+			logger.Info("Outputting logs to stdout")
+			logger.SetOutput(os.Stdout)
+		}
+	} else {
+		// no logging needed
+		logger.Info("No further logs will be outputted")
+		logger.SetOutput(nil)
+	}

 	// launch concurrent scraping !
 	workerPool.Work()
@ -441,27 +428,12 @@ func main() {
 		}()
 	}

-	// get text text results and write it to the output file (found files are handled by each worker separately)
-	var outputFile *os.File
-	for {
-		result, ok := <-results
-		if !ok {
-			break
-		}
-
-		// as it is possible to change configuration "on the fly" - it's better to not mess up different outputs
-		if result.Search.Query == config.QueryEmail {
-			outputFile = emailsOutputFile
-		} else {
-			outputFile = textOutputFile
-		}
+	// set up graceful shutdown
+	sig := make(chan os.Signal, 1)
+	signal.Notify(sig, os.Interrupt)
+	<-sig
+	logger.Info("Received interrupt signal. Exiting...")

-		// each entry in output file is a self-standing JSON object
-		entryBytes, err := json.MarshalIndent(result, " ", "\t")
-		if err != nil {
-			continue
-		}
-		outputFile.Write(entryBytes)
-		outputFile.Write([]byte("\n"))
-	}
+	// stop workers
+	workerPool.Stop()
 }
--- a/src/web/audio.go
+++ b/src/web/audio.go
@ -20,99 +20,25 @@ package web

 import (
 	"net/url"
-	"strings"
 )

-func HasAudioExtention(url string) bool {
-	for _, extention := range AudioExtentions {
-		if strings.HasSuffix(url, extention) {
-			return true
-		}
-	}
-
-	return false
-}
-
 // Tries to find audio URLs on the page
-func FindPageAudio(pageBody []byte, from *url.URL) []string {
-	var urls []string
+func FindPageAudio(pageBody []byte, from url.URL) []url.URL {
+	var urls []url.URL

 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasAudioExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageSrcLinks(pageBody, from) {
+		if HasAudioExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasAudioExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageLinks(pageBody, from) {
+		if HasAudioExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

-	// return discovered mutual video urls
 	return urls
 }
--- a/src/web/documents.go
+++ b/src/web/documents.go
@ -1,97 +1,42 @@
+/*
+	Wecr - crawl the web for data
+	Copyright (C) 2023 Kasyanov Nikolay Alexeyevich (Unbewohnte)
+
+	This program is free software: you can redistribute it and/or modify
+	it under the terms of the GNU Affero General Public License as published by
+	the Free Software Foundation, either version 3 of the License, or
+	(at your option) any later version.
+
+	This program is distributed in the hope that it will be useful,
+	but WITHOUT ANY WARRANTY; without even the implied warranty of
+	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+	GNU Affero General Public License for more details.
+
+	You should have received a copy of the GNU Affero General Public License
+	along with this program.  If not, see <https://www.gnu.org/licenses/>.
+*/
+
 package web

 import (
 	"net/url"
-	"strings"
 )

-func HasDocumentExtention(url string) bool {
-	for _, extention := range DocumentExtentions {
-		if strings.HasSuffix(url, extention) {
-			return true
-		}
-	}
-
-	return false
-}
-
 // Tries to find docs' URLs on the page
-func FindPageDocuments(pageBody []byte, from *url.URL) []string {
-	var urls []string
+func FindPageDocuments(pageBody []byte, from url.URL) []url.URL {
+	var urls []url.URL

 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasDocumentExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageSrcLinks(pageBody, from) {
+		if HasDocumentExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasDocumentExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageLinks(pageBody, from) {
+		if HasDocumentExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

--- a/src/web/extentions.go
+++ b/src/web/extentions.go
@ -18,6 +18,8 @@

 package web

+import "strings"
+
 var AudioExtentions = []string{
 	".3gp",
 	".aa",
@ -134,3 +136,39 @@ var DocumentExtentions = []string{
 	".otf",
 	".exif",
 }
+
+func HasImageExtention(urlPath string) bool {
+	for _, extention := range ImageExtentions {
+		if strings.HasSuffix(urlPath, extention) {
+			return true
+		}
+	}
+	return false
+}
+
+func HasDocumentExtention(urlPath string) bool {
+	for _, extention := range DocumentExtentions {
+		if strings.HasSuffix(urlPath, extention) {
+			return true
+		}
+	}
+	return false
+}
+
+func HasVideoExtention(urlPath string) bool {
+	for _, extention := range VideoExtentions {
+		if strings.HasSuffix(urlPath, extention) {
+			return true
+		}
+	}
+	return false
+}
+
+func HasAudioExtention(urlPath string) bool {
+	for _, extention := range AudioExtentions {
+		if strings.HasSuffix(urlPath, extention) {
+			return true
+		}
+	}
+	return false
+}
--- a/src/web/images.go
+++ b/src/web/images.go
@ -20,99 +20,25 @@ package web

 import (
 	"net/url"
-	"strings"
 )

-func HasImageExtention(url string) bool {
-	for _, extention := range ImageExtentions {
-		if strings.HasSuffix(url, extention) {
-			return true
-		}
-	}
-
-	return false
-}
-
 // Tries to find images' URLs on the page
-func FindPageImages(pageBody []byte, from *url.URL) []string {
-	var urls []string
+func FindPageImages(pageBody []byte, from url.URL) []url.URL {
+	var urls []url.URL

 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasImageExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageSrcLinks(pageBody, from) {
+		if HasImageExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasImageExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageLinks(pageBody, from) {
+		if HasImageExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

-	// return discovered mutual image urls from <img> and <a> tags
 	return urls
 }
--- a/src/web/text.go
+++ b/src/web/text.go
@ -36,28 +36,28 @@ var tagSrcRegexp *regexp.Regexp = regexp.MustCompile(`(?i)(src)[\s]*=[\s]*("|')(

 var emailRegexp *regexp.Regexp = regexp.MustCompile(`[A-Za-z0-9._%+\-!%&?~^#$]+@[A-Za-z0-9.\-]+\.[a-zA-Z]{2,4}`)

-// var emailRegexp *regexp.Regexp = regexp.MustCompile("[a-zA-Z0-9.!#$%&'*+\\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*")
-
 // Fix relative link and construct an absolute one. Does nothing if the URL already looks alright
-func ResolveLink(url *url.URL, fromHost string) string {
-	if !url.IsAbs() {
-		if url.Scheme == "" {
+func ResolveLink(link url.URL, fromHost string) url.URL {
+	var resolvedURL url.URL = link
+
+	if !resolvedURL.IsAbs() {
+		if resolvedURL.Scheme == "" {
 			// add scheme
-			url.Scheme = "http"
+			resolvedURL.Scheme = "https"
 		}

-		if url.Host == "" {
+		if resolvedURL.Host == "" {
 			// add host
-			url.Host = fromHost
+			resolvedURL.Host = fromHost
 		}
 	}

-	return url.String()
+	return resolvedURL
 }

-// Find all links on page that are specified in <a> tag
-func FindPageLinks(pageBody []byte, from *url.URL) []string {
-	var urls []string
+// Find all links on page that are specified in href attribute. Do not resolve links. Return URLs as they are on the page
+func FindPageLinksDontResolve(pageBody []byte) []url.URL {
+	var urls []url.URL

 	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
 		var linkStartIndex int
@ -90,9 +90,69 @@ func FindPageLinks(pageBody []byte, from *url.URL) []string {
 			continue
 		}

-		urls = append(urls, ResolveLink(link, from.Host))
+		urls = append(urls, *link)
+	}
+
+	return urls
+}
+
+// Find all links on page that are specified in href attribute
+func FindPageLinks(pageBody []byte, from url.URL) []url.URL {
+	urls := FindPageLinksDontResolve(pageBody)
+	for index := 0; index < len(urls); index++ {
+		urls[index] = ResolveLink(urls[index], from.Host)
+	}
+
+	return urls
+}
+
+// Find all links on page that are specified in "src" attribute. Do not resolve ULRs, return them as they are on page
+func FindPageSrcLinksDontResolve(pageBody []byte) []url.URL {
+	var urls []url.URL
+
+	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
+		var linkStartIndex int
+		var linkEndIndex int
+
+		linkStartIndex = strings.Index(match, "\"")
+		if linkStartIndex == -1 {
+			linkStartIndex = strings.Index(match, "'")
+			if linkStartIndex == -1 {
+				continue
 			}

+			linkEndIndex = strings.LastIndex(match, "'")
+			if linkEndIndex == -1 {
+				continue
+			}
+		} else {
+			linkEndIndex = strings.LastIndex(match, "\"")
+			if linkEndIndex == -1 {
+				continue
+			}
+		}
+
+		if linkEndIndex <= linkStartIndex+1 {
+			continue
+		}
+
+		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
+		if err != nil {
+			continue
+		}
+
+		urls = append(urls, *link)
+	}
+
+	return urls
+}
+
+// Find all links on page that are specified in "src" attribute
+func FindPageSrcLinks(pageBody []byte, from url.URL) []url.URL {
+	urls := FindPageSrcLinksDontResolve(pageBody)
+	for index := 0; index < len(urls); index++ {
+		urls[index] = ResolveLink(urls[index], from.Host)
+	}
 	return urls
 }

--- a/src/web/videos.go
+++ b/src/web/videos.go
@ -20,99 +20,25 @@ package web

 import (
 	"net/url"
-	"strings"
 )

-func HasVideoExtention(url string) bool {
-	for _, extention := range VideoExtentions {
-		if strings.HasSuffix(url, extention) {
-			return true
-		}
-	}
-
-	return false
-}
-
 // Tries to find videos' URLs on the page
-func FindPageVideos(pageBody []byte, from *url.URL) []string {
-	var urls []string
+func FindPageVideos(pageBody []byte, from url.URL) []url.URL {
+	var urls []url.URL

 	// for every element that has "src" attribute
-	for _, match := range tagSrcRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasVideoExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageSrcLinks(pageBody, from) {
+		if HasVideoExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

 	// for every "a" element as well
-	for _, match := range tagHrefRegexp.FindAllString(string(pageBody), -1) {
-		var linkStartIndex int
-		var linkEndIndex int
-
-		linkStartIndex = strings.Index(match, "\"")
-		if linkStartIndex == -1 {
-			linkStartIndex = strings.Index(match, "'")
-			if linkStartIndex == -1 {
-				continue
-			}
-
-			linkEndIndex = strings.LastIndex(match, "'")
-			if linkEndIndex == -1 {
-				continue
-			}
-		} else {
-			linkEndIndex = strings.LastIndex(match, "\"")
-			if linkEndIndex == -1 {
-				continue
-			}
-		}
-
-		if linkEndIndex <= linkStartIndex+1 {
-			continue
-		}
-
-		link, err := url.Parse(match[linkStartIndex+1 : linkEndIndex])
-		if err != nil {
-			continue
-		}
-
-		linkResolved := ResolveLink(link, from.Host)
-		if HasVideoExtention(linkResolved) {
-			urls = append(urls, linkResolved)
+	for _, link := range FindPageLinks(pageBody, from) {
+		if HasVideoExtention(link.EscapedPath()) {
+			urls = append(urls, link)
 		}
 	}

-	// return discovered mutual video urls
 	return urls
 }
--- a/src/worker/pool.go
+++ b/src/worker/pool.go
@ -48,7 +48,7 @@ type Pool struct {
 }

 // Create a new worker pool
-func NewWorkerPool(jobs chan web.Job, results chan web.Result, workerCount uint, workerConf *WorkerConf, stats *Statistics) *Pool {
+func NewWorkerPool(initialJobs chan web.Job, workerCount uint, workerConf *WorkerConf, stats *Statistics) *Pool {
 	var newPool Pool = Pool{
 		workersCount: workerCount,
 		workers:      nil,
@ -61,7 +61,7 @@ func NewWorkerPool(jobs chan web.Job, results chan web.Result, workerCount uint,

 	var i uint
 	for i = 0; i < workerCount; i++ {
-		newWorker := NewWorker(jobs, results, workerConf, &newPool.visited, newPool.Stats)
+		newWorker := NewWorker(initialJobs, workerConf, &newPool.visited, newPool.Stats)
 		newPool.workers = append(newPool.workers, &newWorker)
 	}

--- a/src/worker/worker.go
+++ b/src/worker/worker.go
@ -19,12 +19,16 @@
 package worker

 import (
+	"bytes"
+	"encoding/json"
 	"fmt"
+	"io"
 	"net/url"
 	"os"
 	"path"
 	"path/filepath"
 	"regexp"
+	"strings"
 	"sync"
 	"time"
 	"unbewohnte/wecr/config"
@ -46,12 +50,13 @@ type WorkerConf struct {
 	BlacklistedDomains []string
 	AllowedDomains     []string
 	VisitQueue         VisitQueue
+	TextOutput         io.Writer
+	EmailsOutput       io.Writer
 }

 // Web worker
 type Worker struct {
 	Jobs    chan web.Job
-	Results chan web.Result
 	Conf    *WorkerConf
 	visited *visited
 	stats   *Statistics
@ -59,10 +64,9 @@ type Worker struct {
 }

 // Create a new worker
-func NewWorker(jobs chan web.Job, results chan web.Result, conf *WorkerConf, visited *visited, stats *Statistics) Worker {
+func NewWorker(jobs chan web.Job, conf *WorkerConf, visited *visited, stats *Statistics) Worker {
 	return Worker{
 		Jobs:    jobs,
-		Results: results,
 		Conf:    conf,
 		visited: visited,
 		stats:   stats,
@ -70,8 +74,8 @@ func NewWorker(jobs chan web.Job, results chan web.Result, conf *WorkerConf, vis
 	}
 }

-func (w *Worker) saveContent(links []string, pageURL *url.URL) {
-	var alreadyProcessedUrls []string
+func (w *Worker) saveContent(links []url.URL, pageURL *url.URL) {
+	var alreadyProcessedUrls []url.URL
 	for count, link := range links {
 		// check if this URL has been processed already
 		var skip bool = false
@ -89,29 +93,29 @@ func (w *Worker) saveContent(links []string, pageURL *url.URL) {
 		}
 		alreadyProcessedUrls = append(alreadyProcessedUrls, link)

-		var fileName string = fmt.Sprintf("%s_%d_%s", pageURL.Host, count, path.Base(link))
+		var fileName string = fmt.Sprintf("%s_%d_%s", pageURL.Host, count, path.Base(link.Path))

 		var filePath string
-		if web.HasImageExtention(link) {
+		if web.HasImageExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveImagesDir, fileName)
-		} else if web.HasVideoExtention(link) {
+		} else if web.HasVideoExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveVideosDir, fileName)
-		} else if web.HasAudioExtention(link) {
+		} else if web.HasAudioExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveAudioDir, fileName)
-		} else if web.HasDocumentExtention(link) {
+		} else if web.HasDocumentExtention(link.Path) {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, config.SaveDocumentsDir, fileName)
 		} else {
 			filePath = filepath.Join(w.Conf.Save.OutputDir, fileName)
 		}

 		err := web.FetchFile(
-			link,
+			link.String(),
 			w.Conf.Requests.UserAgent,
 			w.Conf.Requests.ContentFetchTimeoutMs,
 			filePath,
 		)
 		if err != nil {
-			logger.Error("Failed to fetch file at %s: %s", link, err)
+			logger.Error("Failed to fetch file located at %s: %s", link.String(), err)
 			return
 		}

@ -120,22 +124,115 @@ func (w *Worker) saveContent(links []string, pageURL *url.URL) {
 	}
 }

-// Save page to the disk with a corresponding name
-func (w *Worker) savePage(baseURL *url.URL, pageData []byte) {
-	if w.Conf.Save.SavePages && w.Conf.Save.OutputDir != "" {
-		var pageName string = fmt.Sprintf("%s_%s.html", baseURL.Host, path.Base(baseURL.String()))
-		pageFile, err := os.Create(filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir, pageName))
+// Save page to the disk with a corresponding name; Download any src files, stylesheets and JS along the way
+func (w *Worker) savePage(baseURL url.URL, pageData []byte) {
+	var findPageFileContentURLs func([]byte) []url.URL = func(pageBody []byte) []url.URL {
+		var urls []url.URL
+
+		for _, link := range web.FindPageLinksDontResolve(pageData) {
+			if strings.Contains(link.Path, ".css") ||
+				strings.Contains(link.Path, ".scss") ||
+				strings.Contains(link.Path, ".js") ||
+				strings.Contains(link.Path, ".mjs") {
+				urls = append(urls, link)
+			}
+		}
+		urls = append(urls, web.FindPageSrcLinksDontResolve(pageBody)...)
+
+		return urls
+	}
+
+	var cleanLink func(url.URL, url.URL) url.URL = func(link url.URL, from url.URL) url.URL {
+		resolvedLink := web.ResolveLink(link, from.Host)
+		cleanLink, err := url.Parse(resolvedLink.Scheme + "://" + resolvedLink.Host + resolvedLink.Path)
 		if err != nil {
-			logger.Error("Failed to create page of \"%s\": %s", baseURL.String(), err)
+			return resolvedLink
+		}
+		return *cleanLink
+	}
+
+	// Create directory with all file content on the page
+	var pageFilesDirectoryName string = fmt.Sprintf(
+		"%s_%s_files",
+		baseURL.Host,
+		strings.ReplaceAll(baseURL.Path, "/", "_"),
+	)
+	err := os.MkdirAll(filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir, pageFilesDirectoryName), os.ModePerm)
+	if err != nil {
+		logger.Error("Failed to create directory to store file contents of %s: %s", baseURL.String(), err)
 		return
 	}
-		defer pageFile.Close()

-		pageFile.Write(pageData)
+	// Save files on page
+	srcLinks := findPageFileContentURLs(pageData)
+	for _, srcLink := range srcLinks {
+		web.FetchFile(srcLink.String(),
+			w.Conf.Requests.UserAgent,
+			w.Conf.Requests.ContentFetchTimeoutMs,
+			filepath.Join(
+				w.Conf.Save.OutputDir,
+				config.SavePagesDir,
+				pageFilesDirectoryName,
+				path.Base(srcLink.String()),
+			),
+		)
+	}
+
+	// Redirect old content URLs to local files
+	for _, srcLink := range srcLinks {
+		cleanLink := cleanLink(srcLink, baseURL)
+		pageData = bytes.ReplaceAll(
+			pageData,
+			[]byte(srcLink.String()),
+			[]byte("./"+filepath.Join(pageFilesDirectoryName, path.Base(cleanLink.String()))),
+		)
+	}
+
+	// Create page output file
+	pageName := fmt.Sprintf(
+		"%s_%s.html",
+		baseURL.Host,
+		strings.ReplaceAll(baseURL.Path, "/", "_"),
+	)
+	outfile, err := os.Create(filepath.Join(
+		filepath.Join(w.Conf.Save.OutputDir, config.SavePagesDir),
+		pageName,
+	))
+	if err != nil {
+		fmt.Printf("Failed to create output file: %s\n", err)
+	}
+	defer outfile.Close()
+
+	outfile.Write(pageData)

 	logger.Info("Saved \"%s\"", pageName)
 	w.stats.PagesSaved++
+}
+
+const (
+	textTypeMatch = iota
+	textTypeEmail = iota
+)
+
+// Save text result to an appropriate file
+func (w *Worker) saveResult(result web.Result, textType int) {
+	// write result to the output file
+	var output io.Writer
+	switch textType {
+	case textTypeEmail:
+		output = w.Conf.EmailsOutput
+
+	default:
+		output = w.Conf.TextOutput
 	}
+
+	// each entry in output file is a self-standing JSON object
+	entryBytes, err := json.MarshalIndent(result, " ", "\t")
+	if err != nil {
+		return
+	}
+	output.Write(entryBytes)
+	output.Write([]byte("\n"))
 }

 // Launch scraping process on this worker
@ -236,7 +333,7 @@ func (w *Worker) Work() {
 		}

 		// find links
-		pageLinks := web.FindPageLinks(pageData, pageURL)
+		pageLinks := web.FindPageLinks(pageData, *pageURL)
 		go func() {
 			if job.Depth > 1 {
 				// decrement depth and add new jobs
@ -246,9 +343,9 @@ func (w *Worker) Work() {
 					// add to the visit queue
 					w.Conf.VisitQueue.Lock.Lock()
 					for _, link := range pageLinks {
-						if link != job.URL {
+						if link.String() != job.URL {
 							err = queue.InsertNewJob(w.Conf.VisitQueue.VisitQueue, web.Job{
-								URL:    link,
+								URL:    link.String(),
 								Search: *w.Conf.Search,
 								Depth:  job.Depth,
 							})
@ -262,9 +359,9 @@ func (w *Worker) Work() {
 				} else {
 					//  add to the in-memory channel
 					for _, link := range pageLinks {
-						if link != job.URL {
+						if link.String() != job.URL {
 							w.Jobs <- web.Job{
-								URL:    link,
+								URL:    link.String(),
 								Search: *w.Conf.Search,
 								Depth:  job.Depth,
 							}
@ -280,9 +377,12 @@ func (w *Worker) Work() {
 		var savePage bool = false

 		switch job.Search.Query {
+		case config.QueryArchive:
+			savePage = true
+
 		case config.QueryImages:
 			// find image URLs, output images to the file while not saving already outputted ones
-			imageLinks := web.FindPageImages(pageData, pageURL)
+			imageLinks := web.FindPageImages(pageData, *pageURL)
 			if len(imageLinks) > 0 {
 				w.saveContent(imageLinks, pageURL)
 				savePage = true
@ -291,7 +391,7 @@ func (w *Worker) Work() {
 		case config.QueryVideos:
 			// search for videos
 			// find video URLs, output videos to the files while not saving already outputted ones
-			videoLinks := web.FindPageVideos(pageData, pageURL)
+			videoLinks := web.FindPageVideos(pageData, *pageURL)
 			if len(videoLinks) > 0 {
 				w.saveContent(videoLinks, pageURL)
 				savePage = true
@ -300,7 +400,7 @@ func (w *Worker) Work() {
 		case config.QueryAudio:
 			// search for audio
 			// find audio URLs, output audio to the file while not saving already outputted ones
-			audioLinks := web.FindPageAudio(pageData, pageURL)
+			audioLinks := web.FindPageAudio(pageData, *pageURL)
 			if len(audioLinks) > 0 {
 				w.saveContent(audioLinks, pageURL)
 				savePage = true
@ -309,7 +409,7 @@ func (w *Worker) Work() {
 		case config.QueryDocuments:
 			// search for various documents
 			// find documents URLs, output docs to the file while not saving already outputted ones
-			docsLinks := web.FindPageDocuments(pageData, pageURL)
+			docsLinks := web.FindPageDocuments(pageData, *pageURL)
 			if len(docsLinks) > 0 {
 				w.saveContent(docsLinks, pageURL)
 				savePage = true
@ -319,11 +419,11 @@ func (w *Worker) Work() {
 			// search for email
 			emailAddresses := web.FindPageEmailsWithCheck(pageData)
 			if len(emailAddresses) > 0 {
-				w.Results <- web.Result{
+				w.saveResult(web.Result{
 					PageURL: job.URL,
 					Search:  job.Search,
 					Data:    emailAddresses,
-				}
+				}, textTypeEmail)
 				w.stats.MatchesFound += uint64(len(emailAddresses))
 				savePage = true
 			}
@ -332,29 +432,29 @@ func (w *Worker) Work() {
 			// search for everything

 			// files
-			var contentLinks []string
-			contentLinks = append(contentLinks, web.FindPageImages(pageData, pageURL)...)
-			contentLinks = append(contentLinks, web.FindPageAudio(pageData, pageURL)...)
-			contentLinks = append(contentLinks, web.FindPageVideos(pageData, pageURL)...)
-			contentLinks = append(contentLinks, web.FindPageDocuments(pageData, pageURL)...)
+			var contentLinks []url.URL
+			contentLinks = append(contentLinks, web.FindPageImages(pageData, *pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageAudio(pageData, *pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageVideos(pageData, *pageURL)...)
+			contentLinks = append(contentLinks, web.FindPageDocuments(pageData, *pageURL)...)
 			w.saveContent(contentLinks, pageURL)

+			if len(contentLinks) > 0 {
+				savePage = true
+			}
+
 			// email
 			emailAddresses := web.FindPageEmailsWithCheck(pageData)
 			if len(emailAddresses) > 0 {
-				w.Results <- web.Result{
+				w.saveResult(web.Result{
 					PageURL: job.URL,
 					Search:  job.Search,
 					Data:    emailAddresses,
-				}
+				}, textTypeEmail)
 				w.stats.MatchesFound += uint64(len(emailAddresses))
 				savePage = true
 			}

-			if len(contentLinks) > 0 || len(emailAddresses) > 0 {
-				savePage = true
-			}
-
 		default:
 			// text search
 			switch job.Search.IsRegexp {
@ -368,11 +468,11 @@ func (w *Worker) Work() {

 				matches := web.FindPageRegexp(re, pageData)
 				if len(matches) > 0 {
-					w.Results <- web.Result{
+					w.saveResult(web.Result{
 						PageURL: job.URL,
 						Search:  job.Search,
 						Data:    matches,
-					}
+					}, textTypeMatch)
 					logger.Info("Found matches: %+v", matches)
 					w.stats.MatchesFound += uint64(len(matches))
 					savePage = true
@ -380,11 +480,11 @@ func (w *Worker) Work() {
 			case false:
 				// just text
 				if web.IsTextOnPage(job.Search.Query, true, pageData) {
-					w.Results <- web.Result{
+					w.saveResult(web.Result{
 						PageURL: job.URL,
 						Search:  job.Search,
 						Data:    []string{job.Search.Query},
-					}
+					}, textTypeMatch)
 					logger.Info("Found \"%s\" on page", job.Search.Query)
 					w.stats.MatchesFound++
 					savePage = true
@ -393,8 +493,8 @@ func (w *Worker) Work() {
 		}

 		// save page
-		if savePage {
-			w.savePage(pageURL, pageData)
+		if savePage && w.Conf.Save.SavePages {
+			w.savePage(*pageURL, pageData)
 		}
 		pageData = nil
 		pageURL = nil
Author	SHA1	Message	Date
Kasianov Nikolai Alekseevich	722f3fb536	Fixed emails being saved to a wrong file under query=everything; improved page saving process; fixed pages being saved not considering the actual setting; Added non-link resolving variation of FindPageLinks; Added query=archive functionality; working directory is now an actual working directory instead an executables directory	2 years ago
Kasianov Nikolai Alekseevich	c91986d42d	Dashboard: Stop\|Resume worker pools work at runtime	2 years ago
Kasianov Nikolai Alekseevich	c2ec2073dc	Removed unnecessary results channel	2 years ago
Kasianov Nikolai Alekseevich	812fd2adf7	Moved up until now separate text saving code to the worker package where it should be	2 years ago