yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	d97deb5555	npe fix	10 years ago
Michael Peter Christen	4fe4bf29ad	added rss feed output to snapshot servlet which can be used to get a list of latest/oldest entries in the snapshot database. This is an example: http://localhost:8090/api/snapshot.rss?depth=2&order=LATESTFIRST&host=yacy.net&maxcount=100 The properties depth, order, host and maxcount can be omited. The meaning of the fields are: host: select only urls from this host or all, if not given depth: select only urls at that crawl depth or all, if not given maxcount: select at most the given number of urls or 10, if not given order: either LATESTFIRST to select the youngest entries, OLDESTFIRST to select the first entries or ANY to select any The rss feed needs administration rights to work, a call to this servlet with rss extension must attach login credentials.	10 years ago
reger	ff18129def	ViewFile servlet: update index if newer, so viewed text and metadata (stored) info is similar - to archive it, use request with profile to allow indexing (defaultglobaltext) and update index (the resource is loaded, parsed anyway, so it's not a expensive operation) Request: remove 2 unused init parameter - number of anchors of the parent - forkfactor sum of anchors of all ancestors	10 years ago
Michael Peter Christen	226aea5914	added a servlet which can create preview images, preview tumbnails and preview pdfs from web pages, i.e.: http://localhost:8090/api/snapshot.png?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.jpg?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.pdf?url=http://yacy.net/en/ This supports also an on-the-fly generation of the preview documents if the user is an administrator. Otherwise, the servlet fails. To enable this, you must add wkhtmltopdf, imagemagick and (on headless servers) xvfb to your operation system. for detailed instructions, see `97f6089a41`	10 years ago
Michael Peter Christen	0550b54d56	added fix to postprocessing: avoid caching of postprocessing collection to always get fresh lists of documents. This is necessary since the postprocessing changes the same documents which the postprocessing-collection query selects.	10 years ago
Michael Peter Christen	1db476c67e	fix for bad table iteration	10 years ago
orbiter	3ffe19b85c	replaced old /api/table_p.xml servlet with /Tables_p.xml to avoid double code	10 years ago
Michael Peter Christen	07c5b57953	removed warnings	10 years ago
reger	f5967dfedf	add filter to citation page and a on/off button to display only sentences with citations, while maintaining the sentence number. Make the filtered list the default in search result citation link	10 years ago
Michael Peter Christen	0bfc69b29b	more ipv6 bugfixes	10 years ago
Marc Nause	1e6e69bc40	Finished implementation of UPNP: ) will try other ports if YaCy standard ports are not available ) distinguish between internal and external port (not sure if this works 100%) Still to add: propery in config to enter own external port (in case of manually configured NAT)	10 years ago
orbiter	3ac31614a3	added option to reverse-sort YaCy tables (internal API change only)	11 years ago
Michael Peter Christen	2a52c6f0f1	using htroot/api/blacklists as source folder: removed package declaration of some classes in that folder	11 years ago
reger	6654d314f1	add rss version to api/feed.rss IE11 reports error without	11 years ago
orbiter	2371d6b8db	target linktexts must be string to enable search facets on these fields	11 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	11 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	11 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	11 years ago
orbiter	59160984cc	timeline performance update	11 years ago
orbiter	2073e69034	fix for long periods in timeline	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
Michael Peter Christen	36e623d8bf	enhanced metadata enrichment for media file type search: - Web servers may now deliver YaCy-specific http header field with a title and keywords. The new http header fields are: X-YaCy-Media-Title - to be used for media (image, audio, video) titles X-YaCy-Media-Keywords - to be used for media (image, audio, video) keywords - both fields are written to document fields title and keywords and are searched also during image search. - to make the usage of arbitrary http header fields (including this new fields) possible in the /api/push_p.json servlet, a new POST argument is also introduced to push http header fields. The new POST attribute is named "responseHeader-X" (where X is the counter). It is allowed to use this attribute as multi-attribute several times, each can be filled with a http header line. - see /api/push_p.html for examples	11 years ago
Michael Peter Christen	8fd72b5e8b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	81d0f01a6f	added 'synchronous' and 'commit' flags in push api	11 years ago
Marc Nause	f443cfa32d	Improvements and bugfixes for recording actions of blacklist API.	11 years ago
orbiter	4177c9cf05	fix for crawl start check	11 years ago
Michael Peter Christen	74c249288a	added a push api to make it possible to upload files directly without crawling to the YaCy indexer. Files are uploaded using POST multipart requests; multiple file uploads are possible as well. Each file has attached the file date and mime type which is used to get the right parser for the submitted data. Also an url is submitted which is assigned to the document. The CrawlSwitchboard has a new option for default Crawl Profiles which are assigned dynamically from the new push interface.	11 years ago
Michael Peter Christen	b3b174e2b8	fixed webgraph postprocessing and status display in Crawler_p servlet	11 years ago
Michael Peter Christen	2520590b45	migrated from pdfbox 1.8.4 to 1.8.5. They have a very long bugfix list for that update: http://www.apache.org/dist/pdfbox/1.8.5/RELEASE-NOTES.txt	11 years ago
Marc Nause	4723329e29	Improved blacklist XML/JSON API.	11 years ago
orbiter	0d8072aa99	removed warnings	11 years ago
Marc Nause	f98ccf952f	Improved Blacklist API: ) added JSON support ) fixed Exception in case of missing parameters *) renamed parameter for items in "add entry" and "delete entry" from "entry" to "item" to match term in XML	11 years ago
Marc Nause	0d88f292dc	Key for parameter "blacklist name" is "list" in all servlets now.	11 years ago
Marc Nause	c97da1a0d8	First draft of a blacklist API.	11 years ago
reger	727dfb5875	refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex	11 years ago
Michael Peter Christen	dd12dd392f	introduction of a data structure for HyperlinkEdges which should use less memory as it does no double-storage of source links for each edge of the graph.	11 years ago
Michael Peter Christen	a37d067692	refactoring	11 years ago
orbiter	c250fac9f4	linkstructure refactoring to get more options for clickdepth analysis	11 years ago
Michael Peter Christen	bd886054cb	new structure and enhancements for link graph computation: - added order option to solr queries to be able to retrieve document lists in specific order, here: link length - added HyperlinkEdge class which manages the link structure - integrated the HyperlinkEdge class into clickdepth computation - extended the linkstructure.json servlet to show also the clickdepth and other statistic information	11 years ago
Michael Peter Christen	e8ddd415a8	enhanced the new link structure graph	11 years ago
Michael Peter Christen	7f5733638b	fix for linkstructure computation: now also detecting dead links	11 years ago
orbiter	18f9c40302	moved Edge class out of linkstructure servlet as this does not work on non-eclipse driven environments (all non-dev cases)	11 years ago
Michael Peter Christen	a6bb9be97e	- added d3.js for visualizations using embedded svg - added a servlet api/linkstructure.json which generates a link graph information in json - added a javascript link graph renderer hypertree.js using d3 and the new servlet linkstructure.json - embedded the new link graph in the crawler monitor and the host browser	11 years ago
Michael Peter Christen	48fbfa60c1	bugfix to inbound/outbound identification	11 years ago
Michael Peter Christen	a3b7366aee	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	92811d7850	fix: 3 more links pointing to old /xml path	11 years ago
Michael Peter Christen	656e2ce62a	replacing direct html table cellspacing with css set-up for cellspacing	11 years ago
orbiter	f8f88d4e81	replaced pdblue-homebrew buttons with bootstrap standard buttons	11 years ago
Michael Peter Christen	85a427ec54	support for multiple sitemaps in robots.txt	11 years ago

1 2 3 4 5 ...

399 Commits (8055ed5b2ac763a8d0706e7387807b77118bc11a)