yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	8a29551c54	Upgraded the OpenGeoDB dump URL The status of the library in the DictionaryLoader_p.html page now also advertises the user that an upgrade can be applied when an older dump is already loaded. Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter chat.	6 years ago
luccioman	0806de8fdc	Ensure file input stream are closed in both normal and error cases.	8 years ago
reger	ec8dd95014	fix deactivation of Russian Thesaurus	8 years ago
Sergey Stepanov	f0317d6715	to add Russian synonyms requires health checks	8 years ago
Ryszard Goń	b0cd0212fd	SynonymLibrary status check fix for multiple files	10 years ago
Ryszard Goń	f3f1b2e899	added English synonyms	10 years ago
reger	2f592a8063	add SynonymLibrary status to DictionaryLoader_p servlet http://mantis.tokeek.de/view.php?id=564	10 years ago
reger	c59ebde083	show location nav as selectable nav in search page layout - switch automatically on upon load of geodata provider - but allow switch on also without geodata file (and display the location nav if search result has lat/lon location)	10 years ago
Michael Peter Christen	6a2a669db4	added loading of the synonyms file from addon/synonyms into the knowledge loader	10 years ago
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	11 years ago
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	1825f165b8	better integration of blacklist according to use case	13 years ago
Michael Peter Christen	24bbe359ca	integrate also geonames library files for less cities. these are more useful for tagging since less normal words are false-identified as location	13 years ago
Michael Peter Christen	f1aa4c4390	- accept only location names wit a minimum length - remove comma from synonym terms	13 years ago
Michael Peter Christen	cc9ad7198a	- use only names which consists of at least two parts - remove word from derewo from locations	13 years ago
Michael Peter Christen	eeb4fd8b8c	refactoring (geolocalzation -> geolocation)	13 years ago
Michael Peter Christen	a0f1decd82	- added loading of the dbpedia pnd triplestore in the dictionary loader - renamed the dictionary loader to knowledge loader - some refactoring in the library provider method names	13 years ago
Michael Peter Christen	d45718251e	refactoring (Localization -> Location)	13 years ago
Michael Peter Christen	b8b3c87ba7	- renamed localization to location (that was confusing) - renamed 'Locale' navigator to 'Location' - produce Location navigation only if geolocation libraries are loaded	13 years ago
Michael Christen	bd40a10230	added autotaggig stub .. only reading and parsing of vocabularies at this time	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b5252ef91f	added new word recommendation library in DictionaryLoader_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	59b767eebd	stop loading via http at defined maximum of bytes - even size is unknown before loading using max-file-size of type int for parsing documents (since content is used as byte-arrays, 'integer' should be maximum) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	115abc8917	- more attributes for search progress bar - moved cache strategy to cora package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4588b5a291	- fixed document number limitation for crawls that restrict the number of documents per domain - some restructuring of the document counting and logging structures was necessary - better abstraction of CrawlProfiles - added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation - more refactoring to get the LibraryProvider more clean - some refactoring of the Condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	9d2159582f	* fix system update if urls are in blacklist (for example for very general blacklists like *.de) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	777195e8d1	more abstraction for access of LoaderDispatcher and cache git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	e43e61e502	added another geolocalization data source: GeoNames - added downloader option in DictionaryLoader - added generalization (interfaces and overarching localization) - more abstraction using the libraries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6879 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	2126c03a62	- removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer - migrated the opengeodb downloader to a new version of the opengeodb-dump git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6873 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	3661cb692c	added dictionary loader servlet that can be used to get the geolocalization file: /DictionaryLoader_p.html Will also be used for more dictionary files in the future git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6872 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago

41 Commits (60dc1241a3e69561f28689c6cdfc0ba8a76c1939)