yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	9fb3142317	Restricted variables scope to function handleStatus() in Crawler.js Missing 'var' in declaration was unnecessarily giving global scope to these variables.	8 years ago
luccioman	52e56025f7	Fixed undefined error case in sorttable.js Occured when a table with class="sortable" has data cells with colspan attribute greater than 1	8 years ago
luccioman	a73c9327a5	JavaScript License fixes for LibreJS compatibility	9 years ago
luccioman	8b95e5c91f	Aplied GNU licensing recommendations.	9 years ago
luccioman	3f6fefb125	Added license information for YaCy owned js files	9 years ago
luccioman	02ecb8de29	Added JavaScript license information First pass applied for YaCy index and administration first page, checked with LibreJS 6.0.13.	9 years ago
Michael Peter Christen	a7b41bd206	use curl downloads in download script with silent mode	9 years ago
Michael Peter Christen	ac034db8bc	Merge branch 'master' of https://github.com/luccioman/yacy_search_server # Conflicts: # htroot/js/highslide/highslide.js # source/net/yacy/document/ImageParser.java	9 years ago
luc	a156fd65d0	Patch to manage render or load errors is still needed after highlight.js version upgrade. Updated patch for better behavior consistency between browsers.	9 years ago
reger	571609c208	upd javascript img viewerto highslide 4.1.13	9 years ago
luc	74b0283d57	Added image preview error management.	9 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	9 years ago
reger	1d8e1e4bac	- Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images - asure ImageResult.imagetext has value for the link text (use filename if no alt text given)	10 years ago
Michael Peter Christen	535f1ebe3b	added a new way of content browsing in search results: - date navigation The date is taken from the CONTENT of the documents / web pages, NOT from a date submitted in the context of metadata (i.e. http header or html head form). This makes it possible to search for documents in the future, i.e. when documents contain event descriptions for future events. The date is written to an index field which is now enabled by default. All documents are scanned for contained date mentions. To visualize the dates for a specific search results, a histogram showing the number of documents for each day is displayed. To render these histograms the morris.js library is used. Morris.js requires also raphael.js which is now also integrated in YaCy. The histogram is now also displayed in the index browser by default. To select a specific range from a search result, the following modifiers had been introduced: from:<date> to:<date> These modifiers can be used separately (i.e. only 'from' or only 'to') to describe an open interval or combined to have a closed interval. Both dates are inclusive. To select a specific single date only, use the 'to:' - modifier. The histogram shows blue and green lines; the green lines denot weekend days (saturday and sunday). Clicking on bars in the histogram has the following reaction: 1st click: add a from:<date> modifier for the date of the bar 2nd click: add a to:<date> modifier for the date of the bar 3rd click: remove from and date modifier and set a on:<date> for the bar When the on:<date> modifier is used, the histogram shows an unlimited time period. This makes it possible to click again (4th click) which is then interpreted as a 1st click again (sets a from modifier). The display feature is NOT switched on by default; to switch it on use the /ConfigSearchPage_p.html servlet.	10 years ago
Michael Peter Christen	d9603039ff	automatically set the Q flag for smb/ftp start urls (split pdf support)	10 years ago
Ryszard Goń	3144313974	Postprocessing progress bar fix (Make it work as [probably] actually intended)	10 years ago
Michael Peter Christen	9fce8bf2a5	crawling of multi-page pdfs with artificial post part on smb or ftp shares is not possible with the disabled setting; this is not temporary disabled until a better solution is on the hand.	10 years ago
reger	b0c87d8240	fix image search expand box, cut-off of 2nd capture line height tested with IE11 and Firefox 32 (change worked for both to show 2nd line without cutting off height) +fix charset parameter in metadataImageParser +update start errMsgTxt to "java 1.7"	10 years ago
orbiter	4177c9cf05	fix for crawl start check	11 years ago
Michael Peter Christen	362c988c05	design fixes to better use the new colours	11 years ago
Michael Peter Christen	bd886054cb	new structure and enhancements for link graph computation: - added order option to solr queries to be able to retrieve document lists in specific order, here: link length - added HyperlinkEdge class which manages the link structure - integrated the HyperlinkEdge class into clickdepth computation - extended the linkstructure.json servlet to show also the clickdepth and other statistic information	11 years ago
Michael Peter Christen	e8ddd415a8	enhanced the new link structure graph	11 years ago
Michael Peter Christen	a6bb9be97e	- added d3.js for visualizations using embedded svg - added a servlet api/linkstructure.json which generates a link graph information in json - added a javascript link graph renderer hypertree.js using d3 and the new servlet linkstructure.json - embedded the new link graph in the crawler monitor and the host browser	11 years ago
Michael Peter Christen	721178dc84	misc style bugfixes	11 years ago
Michael Peter Christen	f0f22e68bb	fix for page navigation bar	11 years ago
Michael Peter Christen	deae992d47	fixes to progess bar	11 years ago
Michael Peter Christen	617dd9c97b	- added new input field in index.html - changed progress bar in yacysearch.html - moved pagination navigation to page bottom - moved search term input field to headline	11 years ago
Michael Peter Christen	ed7ad2ef0a	replaced old navbar with bootstrap pagination	11 years ago
Michael Peter Christen	1245cfeb43	small change to crawler monitor to fit in larger translations	11 years ago
Michael Peter Christen	9e0e39a9a4	small change to start/stop/pause icon style	11 years ago
orbiter	4035e20f0b	unescaping the path	11 years ago
Michael Peter Christen	81926c055d	fixed bug with image search in yacyinteractive	11 years ago
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	11 years ago
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	11 years ago
orbiter	9c681cc00d	added segment sizes, postprocessing status and cpu load to crawler monitor	11 years ago
Roland Haeder	ebbb3bc5c1	Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet	12 years ago
Frank	7763f2554f	add the new PPMbar in Crawler_p for a better style and better use.	12 years ago
orbiter	7ff10bdb1b	fix of page navigation for formatted totalcount numbers	12 years ago
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
orbiter	594ed63f2a	fixed interactive search which caused an error if pubDate is not present in a search result	12 years ago
Michael Peter Christen	de58043205	Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.	12 years ago
Michael Peter Christen	02fa31b5bf	better filesearch layout	12 years ago
Michael Peter Christen	e55ec3071d	reduced number of facets in yacyinteractive (only filetype necessary)	12 years ago
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	12 years ago
Michael Peter Christen	e1f89efd0d	- made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish	12 years ago
Michael Peter Christen	7ad5457db0	using the solr facets as navigation in yacyinteractive.html instead of counting locally result types	12 years ago
Michael Peter Christen	b7004043ea	- added a field cache for solr queries which call only for a single value - fixed a version conflict exception within a solr add request	12 years ago
Michael Peter Christen	86ec199126	using a better file name	12 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	12 years ago
Michael Peter Christen	6fc5400f91	added a tooltip for search navigation to mention that search pages can be navigated using the TAB key	12 years ago
sixcooler	f64e78497a	fix for reload-feature in Crawler_p	13 years ago
cominch	a120ef660b	RDF demo servlet	13 years ago
Michael Peter Christen	638390930d	another patch to fix the Crawler_p layout	13 years ago
Michael Peter Christen	c846e9ca14	redesign of the crawler monitor page: show crawled pages instead of queue of urls that shall be crawled	13 years ago
Michael Peter Christen	08dcf3e5d1	hack to get all results if the actual number is between 10 and 64	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	fa7b3481b3	better navigation in file search: less results by first try, but much faster. after the first search is done, buttons appear to get more results for the same search	13 years ago
Michael Peter Christen	6e51a00a2f	Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size" This reverts commit `73f5a9e8b3`.	13 years ago
Michael Peter Christen	73f5a9e8b3	fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
apfelmaennchen	c7f88f3fd1	fix for http://bugs.yacy.net/view.php?id=101 - the default crawl depth for bookmarks is now editable.	13 years ago
Michael Peter Christen	f214f6ebb4	added no-load queues to the crawler monitor	13 years ago
Michael Christen	1cf0f35621	the link to the path shall be the path	13 years ago
apfelmaennchen	77317a88e0	Added nice jquery tagsinput to bookmarks dialog - similar to delicious.com ;-) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8133 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	9b0879c184	added a hint that the interactive search is only searching in the local index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8116 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5b2e68b60d	fixed page navigation counter git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8113 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	77a080ced9	smaller fixes for YMarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8105 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	dd1482aaf5	further update to YMarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8100 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	564374d1fe	- included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand. - reworked bookmark creation on crawlstart - many smaller adjustments to ymarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	6287c2b4a9	YMarks: - introduced tag manager - a quite powerful tool (still not 100% stable, so be careful) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8060 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	5581be12fb	YMarks: - added backend and api for tag management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8058 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	a3eebfdcba	YMarks: - show active/running crawls - execute crawls (works currently only if API entry is available) - various smaller fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8056 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	4f95f72124	YMarks: - working direct importer for YaCy Crawl Starts - working direct import for old bookmarks.db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8052 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	a8dfe787ed	- updated to jquery flexigrid 1.1 - YMarks.html automatically recognizes if a bookmark is a crawl start git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8040 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	f8b8c82421	- refactoring of getpageinfo_p.xml (moved out of util) - added more logging in getpageinfo_p.xml git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ff32469272	added a link to /api/util/getpageinfo_p.xml as API to crawl start info and to ViewFile.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8035 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	5f7dbe1c42	- some refactoring (ymarks) - improvement for autotagger (is now able to create/detect multi word tags e.g. 'open source') git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8031 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2adc30d335	suppressing size if size unknown git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8005 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b5b09b329c	BOOSTED the image search function. The result page now shows the images as embedded image link from the original source and not from the built-in image buffering and re-sizing servlet. The result is shown much faster now not because YaCy does not need to re-size the images but for a very strange other reason: because of RFC specification (http://tools.ietf.org/html/rfc2616#section-8.1.4) a browser does not open more than two connections to the same server at the same time. If the YaCy image servlet is used, then the target host is the YaCy host for all images and that prevents a parallel computation of the image loading. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7998 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	30d340563e	fix in result count display git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7967 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e48ce5d80e	- style change for search box: larger font, selected by default - style change for search results: by default no parser, size, image info git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7949 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b0b4886618	try to avoid the unresolved pattern in search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7940 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	656286347e	fix for javascript error during search (not ready yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7923 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0229029dcf	a bit protection against search result bugs in interactive search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7920 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ca09081341	better interaction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7875 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	8e03b8ee8b	better integration of server list in interactive search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	594d8f546a	#cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	115abc8917	- more attributes for search progress bar - moved cache strategy to cora package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	fcd4b03892	show progress of search after display of results is finished git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7712 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	b0bdf2d9ed	*) Oops! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7490 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	de065e594f	*) make sure that only positive values are accepted as refresh interval on Crawler Monitor page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7489 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	621e176071	enhancement in table display of path names git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7417 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2751c52617	layout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7415 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	89ae6101b9	fix for NPE and added comment in search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7412 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e38217fe88	small changes to scanner git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7393 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	58b59f9bc8	- a collection of bug fixes and some redesign of the Scanner class - fixed smb crawling - added smbget to download script generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7381 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c36da90261	added a very fast ftp file list generator to site crawler: - when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once - the harvester runs concurrently and feeds into the normal crawl queue also in this: - fixed the 'start from file' crawl function - added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags. - this causes that a crawl start is now also possible from clear text link files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4565b2f2c0	removed the display option from index.html, yacysearch.html and yacyinteractive.html instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	18d33b5c6d	fixed several search result navigation bugs fixed bad behaviours during search result collection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7362 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2 3 4 5 ...

269 Commits (5268ae2ce93c07cd222909fc0732060cc378184a)