yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	348d07a999	Enforced controls on vocabulary editing operations.	7 years ago
luccioman	b67742336e	Provide user interface messages on vocabulary creation read/write errors	7 years ago
luccioman	8d7099a081	Handle escaped line breaks and separators in vocabulary import from CSV	7 years ago
luccioman	09f93fed0e	Added a line start field for vocabulary import from CSV file As a convenience to ignore eventual CSV header lines	7 years ago
luccioman	d28d612069	Added option to choose field delimiter in vocabulary import from CSV	7 years ago
luccioman	9412881230	Added basic support for autotagging microdata annotated item types. With the appropriate vocabulary settings in Vocabulary_p.html page, this can produce Vocabulary search facets displaying item types referenced in html documents by microdata annotation. Tested notably, but not limited to, vocabulary classes/types defined by Schema.org and Dublin Core.	7 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
reger	821262a179	add CommonPattern for multiple spaces to eliminate empty split words on following spaces	10 years ago
reger	f7b0148f6a	fix NPE in Vocabulary_p servlet called w/o parameter	10 years ago
Michael Peter Christen	68c605d637	replace with CommonPattern.SPACE for split	10 years ago
Michael Peter Christen	1f5047b15f	using precompiled pattern CommonPattern.SEMICOLON for splits	10 years ago
Michael Peter Christen	a8a2b7a803	persistency for vocabulary facet switch	10 years ago
Michael Peter Christen	6390454652	fix for vocabulary on/off setting	10 years ago
Michael Peter Christen	ff035a20e7	fix for vocabulary import (double term detection)	10 years ago
Michael Peter Christen	e6650050fe	fix for Is Facet checkbox	10 years ago
Michael Peter Christen	bd3ed5cae5	added charset detection to vocabulary reader	10 years ago
Michael Peter Christen	7bfc5b80cb	added new options to vocabulary editor: - new switch 'isFacet' which causes that the usage of the vocabulary for search facets is enabled or disabled. This shall be used for large vocabularies sind searched in solr are extremely slow if facets for a large set of alternative terms are generated - new option to disable auto-enrichment from synonyms - new option to add synonyms from another column when importing from csv - automatically recognize double-occurrences in synonyms and bundling terms for such synonyms	10 years ago
Michael Peter Christen	092d97d7ac	when importing vocabulary csv files, accept also files without semicolon and truncate quotes from literals	10 years ago
Michael Peter Christen	0dc6e0a5f2	added option to enrich vocabularies with synonyms from synonym database	10 years ago
Michael Peter Christen	ec9d021568	added option in vocabulary editor to import CSV files with different encodings (preselected windows-type character encoding which is typical for CSV files). Fixed also other problems with character encoding in dictionary files. Automatically generated vocabularies are now also noted in the API steering.	10 years ago
Michael Peter Christen	8ad41a882c	fixed several problems with postprocessing: - unique-postprocessing was destroying results from other postprocessings; removed cross-updates as they had been not necessary - unique-postprocessing did not restrict on same protocol - inefficient concurrent update cache was redesigned completely - increased limits for concurrent blocking queues to prevent early time-out	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	c1b7e61882	added option to create empty vocabularies	12 years ago
orbiter	a2160054d7	ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	03280fb161	removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index.	13 years ago
Michael Peter Christen	1d4e206b2b	bugfix in vocabulary generation	13 years ago
Michael Peter Christen	e16e4bd2ba	added ontology extraction in xml as api call for vocabularies	13 years ago
Michael Peter Christen	26cb1c65c2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/document/importer/OAIPMHLoader.java	13 years ago
Michael Peter Christen	963f92ed9a	- merged files - changed behaviour of delete button in vocabulary edit - fixed size numbe in vocabulary listing	13 years ago
Michael Peter Christen	743b0ec89f	- added size of vocabulary to vocabulary view - fixed bad terms in vocabulary-from-titles autogeneration	13 years ago
Michael Peter Christen	22d5e33c5e	added more methods to vocabulary generation: scrape document title and document author to vocabulary	13 years ago
Michael Peter Christen	c2f0d16d2c	fixed vocabulary initialization	13 years ago
Michael Peter Christen	df3531f8d5	added the generation of virtual vocabularies using the pnd	13 years ago
Michael Peter Christen	1f9120d189	create new vocabularies also without an objectspace. this creates an empty vocabulary	13 years ago
Michael Peter Christen	a5cdfb91de	- fixed Cache link (below snippet) - added 'Augmented Proxy' link below snippet - added configuration options for augmented proxy	13 years ago
Michael Peter Christen	16d8f33795	added objectlink generation to vocabulary generation and editor	13 years ago
Michael Peter Christen	e89747bb67	- added automated generation of vocabularies from url stubs - added clear of all terms for vocabularies - added deletion of vocabularies	13 years ago

1 2

51 Commits (313204ae2c8cdd52961a61efb2315d2b2bdbbde0)