yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	da86f150ab	- added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services.	11 years ago
orbiter	3c8d6e1eee	added adminAccount switch to ConfigAccounts_p servlet to switch on protection of all pages; some refactoring as well	11 years ago
reger	365f77ea8c	make internal page links relative to ease any future development for context aware servlets note also http://bugs.yacy.net/view.php?id=106	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Michael Peter Christen	0c5bed7e2c	added configuration option for greedy learning function to ConfigPortal servlet	12 years ago
Michael Peter Christen	8ea6ddf636	removed attributes from ConfigPortal.html which are redundant to ConfigSearchPage_p.html	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
reger	1fb452174a	read defaults from yacy.init for "Set to Defaults" button	12 years ago
reger	e9e0d63897	Add config option to show HostBrowser link in search result - ConfigPortal: added checkbox Host Browser - yacy.init: added search.result.show.hostbrowser as default = on (true) - fix HostBrowser: broken link to protected WebStructurePicture for public user	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
cominch	c63c3a4495	Show additional interaction elements in footer section on each page, if activated in ConfigPortal.html. This footer is also visible in augmented browsing proxy mode.	13 years ago
cominch	84a11ec48c	Corrected loading of default page settings on ConfigPortal.html	13 years ago
cominch	3c255c025b	Show tags in search results (if activated in ConfigPortal_p.html)	13 years ago
Michael Peter Christen	a5cdfb91de	- fixed Cache link (below snippet) - added 'Augmented Proxy' link below snippet - added configuration options for augmented proxy	13 years ago
Michael Peter Christen	5aee19daa4	added show from cache in search results (not yet finished)	13 years ago
Michael Peter Christen	8b974905ee	changed log-in text for all servlets with authentication: - added hint how to set the password using a shell script - added a shell script to change the password	13 years ago
Michael Peter Christen	1473e2258e	fix for http://bugs.yacy.net/view.php?id=154	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago
Michael Peter Christen	4c5edab1ec	added option to have exception search result windows	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
Michael Christen	d6e6f7715b	added "about" box configuration	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ac5bda205f	- removed lower page navigation (it never looks nice) - added visibility of metadata and parser in search results since that shows what YaCy can do in a nice way git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8091 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c659310e89	- removed option to search for audio, video and applications. These things are still experimental and should not be shown to new users since this would cause them to argue that YaCy does not work. The functions are stil available, because: - added a configuration option in ConfigPortal to swtich the search media types on or off git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8090 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6cd27473f5	- better default values for caching and cache usage - set new caching and verification behavior according to use case automatically git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	804e48888b	smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter should also be a little bit faster git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ba41a869a7	set default number of search results in ConfigPortal.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8008 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d260b25457	fix for npe git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8006 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ba03ca8620	added more configuration options for search: - removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal - added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal - added configuration of navigation options to ConfigPortal - added an option to switch off automatic index cleaning in case that a link verification method fails git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	2861d0888a	) simplified code\n) fixed potential NumberFormatExceptions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	70ca7cec8c	fix for http://forum.yacy-websuche.de/viewtopic.php?p=21763#p21763 and another fix for non-working global search when search options are switched off git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7467 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	fe93caac5a	added flags and administration options to show advanced search and to show search result attributes (for each search result) Administration can be done at ConfigPortal.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	88773e4daa	changed the default port from 8080 to 8090 see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4565b2f2c0	removed the display option from index.html, yacysearch.html and yacyinteractive.html instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	fc2e41e691	added a forwarder for the default page. The forwarder forwards a browser to a different page if the root file index.html is accessed. This can be done by setting the name of the forwarder page to the field "Default index.html Page (by forwarder)" in /ConfigPortal.html The purpose is to forward to /yacyinteractive.html for the 27C3 FTP search plattform git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7365 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	e7552bd719	*) cleaning up the code a little bit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7343 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	facfd204e9	added a parent configuration option. see /ConfigPortal.html requested here: http://forum.yacy-websuche.de/viewtopic.php?p=21099#p21099 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7271 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	308a973503	refactoring of tables data organisation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6644 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	24060885b6	- added Tables abstraction in data.Tables.java fix for http://forum.yacy-websuche.de/viewtopic.php?p=18910#p18910 http://forum.yacy-websuche.de/viewtopic.php?p=18894#p18894 http://forum.yacy-websuche.de/viewtopic.php?p=18814#p18814 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6631 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	8ce936bcdd	added an api recording function: it shall be possible to record all operations on YaCy in a database that should make it possible 1) to re-create a setting on fresh peers 2) to transmit a setting from one peer to another 3) to re-create crawl starts after a complete deletion of the index This functionality will also support 4) scheduled re-crawls (new implementation) To implement this, a new database structure has been crated that stores maps into blob heaps. to encode maps the b-encoding technique was used (this is the same encoding that torrent files use) - added a b-encoder - enhanced the b-decoder - added a b-encoded map heap data structure - added a table organisation based on b-encoded heaps - added a servlet to maintain such tables (see Tables_p.html) - integrated the servlet into the Advanced Settings menu - added an api recording based on the new tables git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6606 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	5841ee83d3	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6400 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
low012	5e4f267a36	*) added subversion properties and edited a few comments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6348 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1d8d51075c	refactoring: - removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here: http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages. - cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http. - because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5bb8074150	removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency. - The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well. - Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified. - Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed. - The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here. - Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago

1 2

51 Commits (0f425e01ca0c9bcdc0477b1126bc12b5f67e783d)