yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luc	ba0a293f5c	Corrected another case of org.apache.lucene.store.AlreadyClosedException" occuring when SearchEvent.cleanup() was called while committing local solr index.	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
luc	e40ae0943b	- No max dimensions specified : render raw image data when source and target image format are the same. - Corrected scaling condition.	9 years ago
luc	bc6c79fc12	Corrected scaling function for non RGB images.	9 years ago
luc	6291a57300	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	7d0d19cb8e	avoid File.deleteOnExit() on temp files JVM registers each file in a list regardless of already deleted and never cleans up the list during runtime. This accumulates to a considerable amount of mem during large crawls and/or long uptime. To tackle this, all temp files are now created in a subdir of java.io.tmpdir and the jvm tmpdir property is set to this subdir, which is deleted by code on shutdown. Additionally let pdfParser use this tmp subdir too.	9 years ago
luc	745e97a575	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	a60b1fb6c2	differentiate api call getLocalPort() from getConfigInt()	9 years ago
luc	fc3294382e	Updated javadocs for warning on target encoding format potential errors.	9 years ago
luc	aa70ff4ff6	Corrected images alpha channel rendering	9 years ago
reger	e53c6bbd51	fix init of peer flags (remove hiding of ssl flag)	10 years ago
reger	826f14f37f	fix unnececary set null of peer flags, causing reread remove obsolete version flags	10 years ago
Michael Peter Christen	87f358058e	Fix for index entries which have id's not computed as hash from the url. This makes it possible to operate with outside-computed url hashes in enterprise environments not using the build-in crawler from YaCy.	10 years ago
reger	e37a4f0b3d	prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation	10 years ago
reger	7ed812a2bf	log missing seed.port in favour of exception to prevent repeating throws	10 years ago
reger	f7b0b3b7b3	avoid runtime exception by earlier testing for seed.ip=null	10 years ago
Michael Peter Christen	dbbad23e12	removed warnings	10 years ago
Michael Peter Christen	11a848da5a	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
Michael Peter Christen	b94bd7f20a	a collection of search query enhancements: - fixed superfluous space in query field list - fixed filter query logic - removed look-ahead query which caused that each new search page submitted two solr queries - fixed random solr result orders in case that the solr score was equal: this was then re-ordered by YaCy using the document hash which came from the solr object and that appeared to be random. Now the hash of the url is used and the score is additionally modified by the url length to prevent that this particular case appears at all.	10 years ago
reger	6d3534e725	remove unused Transmission hit counter	10 years ago
reger	0fab445b19	Resourceobserver log warning - deleting releases files - only on actual deletes instead of entering routine	10 years ago
Michael Peter Christen	eec78e1b0c	added intensity option to graphics	10 years ago
reger	31346e873b	upd library reference of missing jsch-0.1.21 in seeduploadscp.xml upd to jsch-0.1.52.jar	10 years ago
reger	296e97c78e	put https port in peers dna as we flag if a peer is accesible via https, we need to know the port if we want to use is (e.g. for interYaCy communication) start to provide / tansport the port by recording it in peers dna. - add https link on the Network.html lock symbol	10 years ago
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
Michael Peter Christen	9bf0d7ecb9	added a new collection type 'dht' to all documents from the peer-to-peer interface to distinguish rich and poor document data. This also reverts some changes from commit `796770e070` because the firstSeen database is the wrong method to distinguish these types of data	10 years ago
reger	796770e070	prevent overwrite of crawled or received full documents by (newer) metadata To protect rich index data (full resource) from overwriting by metadata gathered during remote search, the newly introduced "firstSeen" index is used to differentiate between full-resource-doc and metadata, as a "firstSeen" entry is only added on store's of full-resource-docs (during crawl or remote search).	10 years ago
reger	86073a5ba3	For remote crawlReceipt add document abstract/description enhance the returned metadata returned to the originator by description_txt to improve fulltext search result hits.	10 years ago
Michael Peter Christen	535f1ebe3b	added a new way of content browsing in search results: - date navigation The date is taken from the CONTENT of the documents / web pages, NOT from a date submitted in the context of metadata (i.e. http header or html head form). This makes it possible to search for documents in the future, i.e. when documents contain event descriptions for future events. The date is written to an index field which is now enabled by default. All documents are scanned for contained date mentions. To visualize the dates for a specific search results, a histogram showing the number of documents for each day is displayed. To render these histograms the morris.js library is used. Morris.js requires also raphael.js which is now also integrated in YaCy. The histogram is now also displayed in the index browser by default. To select a specific range from a search result, the following modifiers had been introduced: from:<date> to:<date> These modifiers can be used separately (i.e. only 'from' or only 'to') to describe an open interval or combined to have a closed interval. Both dates are inclusive. To select a specific single date only, use the 'to:' - modifier. The histogram shows blue and green lines; the green lines denot weekend days (saturday and sunday). Clicking on bars in the histogram has the following reaction: 1st click: add a from:<date> modifier for the date of the bar 2nd click: add a to:<date> modifier for the date of the bar 3rd click: remove from and date modifier and set a on:<date> for the bar When the on:<date> modifier is used, the histogram shows an unlimited time period. This makes it possible to click again (4th click) which is then interpreted as a 1st click again (sets a from modifier). The display feature is NOT switched on by default; to switch it on use the /ConfigSearchPage_p.html servlet.	10 years ago
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	10 years ago
Michael Peter Christen	69eacdf4eb	applying precompiled CommonPattern.COMMA.split to all places where split(",") was used	10 years ago
Michael Peter Christen	3d717b749a	fix for urlmaskfilter	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago
Michael Peter Christen	8c3e5b7b6d	added experimental pdf splitting which enables YaCy to split pdfs during parsing into individual pages and add them all using different URLs. These constructed urls are generated from the source url with an appended page=<pagenumber> attribute to the url get/post properties. This will distinguish the different page entries. The search result list will then replace the post parameter with a url anchor # mark which causes that the original url is presented in the search result. These URLs can be opened directly on the correct page using pdf.js which is now built-in into firefox. That means: if you find a search hit on page 5 and click on the search result, firefox will open the pdf viewer and shows page 5.	10 years ago
Michael Peter Christen	5516819354	preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again.	10 years ago
Michael Peter Christen	28683530cd	fixes to usage of no-cache: use and recognize also the no-store directive	10 years ago
reger	7d863d6254	fix empty text facet entry (noticed on Author facet)	10 years ago
Michael Peter Christen	0a879c98e7	added new 'firstSeen' database table and necessary data structures which hold a date for each URL to record when a url was first seen. This is then used to overwrite the modification date for urls upon recrawl in case that the first-seen date is before the latest document date. This behaviour is necessary due to the common behaviour of content management systems which attach always the current date to all documents. Using the firstSeen database it is possible to approximate a real first document creation date in case that the crawler starts frequently for the same domain. As a result the search results ordered by date have a much better quality and the usage of YaCy as search agent for latest news has a better quality.	10 years ago
orbiter	72c2bc5189	fix for search in case where local peer has no local seed address in portal mode	11 years ago
Michael Peter Christen	167c5a51f0	IPv6 fix	11 years ago
orbiter	fa2ad101ec	enhanced graphics computation (avoiding long string parsing for colours)	11 years ago
orbiter	ef813cec91	added proper copyright notice to OSM tiles presented at the search result page	11 years ago
Michael Peter Christen	f818f84adb	more ipv6 fixes	11 years ago
Michael Peter Christen	afd5bd5f5f	slightly enhanced Network table computation by using a lazy initialized bitfield for peer flags	11 years ago
Michael Peter Christen	2c2b50e65d	refactoring (class name should start with uppercase letter)	11 years ago
Michael Peter Christen	bc275dca07	added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer.	11 years ago
Michael Peter Christen	e8392e2ff2	fix for local search	11 years ago
Michael Peter Christen	0bfc69b29b	more ipv6 bugfixes	11 years ago
Michael Peter Christen	883622306e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/peers/Protocol.java	11 years ago
Michael Peter Christen	97995a1dd9	fix for remote search process	11 years ago

1 2 3 4 5 ...

362 Commits (caf9e98f09b144933c9f23840cebfc8b5739a931)