orbiter
aa65282259
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
63762d8f89
removed kelondro dependencies from cora
12 years ago
orbiter
39564fddbd
more ignore
12 years ago
orbiter
6e0f4557f8
added ftp to getName
12 years ago
cominch
23204d2245
change parameter to support the smw extension for list import
12 years ago
Michael Peter Christen
c235d5c0f1
fixed size parsing in RSS message parser (for YaCy size parameter)
12 years ago
orbiter
089a03114e
full memory usage for debian and when changing the size: debian seems to
...
dislike the big difference between xmx and xms (I have crashes here
which stop if both values are same)
12 years ago
Michael Peter Christen
5bc8f34150
fix for success query counter
12 years ago
orbiter
60b1e23f05
added new crawl options:
...
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
12 years ago
orbiter
4987921d3d
fixed the size() method which counted also failed pages (which are also
...
inside the solr index)
12 years ago
Michael Peter Christen
6ec02deec6
added new crawl attributes in crawl profile (not active yet)
12 years ago
Michael Peter Christen
a13e5153ac
- added the possibility to have not one but a list of crawl start urls
...
- the list of urls is entered in the expert crawl start in a textfield;
the one-line input field was replaced with a text box
- start urls can also be given in one single line where the urls are
separated by a '|'-character
- as an effect, the crawl profile cannot carry a single start url for
identificaton because it is possible to have more. Therefore the url was
removed from the crawl profile
- this affect all servlets which display a crawl profile: removed the
url field from all there servlets
- to work consistently with several start urls and the other crawl
starts which computed crawl start url lists from sitelists or sitemaps,
the crawl start servlet was restructured completely
- new rules for must-match patterns were created to make it possible
that site crawl starts also work with several crawl starts at once
12 years ago
Michael Peter Christen
975bc95ddf
added default facet fields for json response format (stub)
12 years ago
Michael Peter Christen
2f218df55d
added missing license headers
12 years ago
Michael Peter Christen
a30653a864
added a regular expression test servlet which is linked within the
...
parser/crawler error page whenever a problem with regular expression
occurs.
This makes it easy to correct and enhance the must-match and
must-not-match patterns just by trying out which pattern could be
correct.
12 years ago
Michael Peter Christen
0504b01bdc
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
9413f77b65
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
a55e77a115
added twitter search heuristic
12 years ago
Michael Peter Christen
e54ac38095
- some corrections in usage of getFile() and getFileName()
...
- added more attributes in json response writer according to yacy
servlet
12 years ago
Michael Peter Christen
62add1d564
added the protocol and the file name extension to the solr fields since
...
these fields are probably facets in file search
12 years ago
Michael Peter Christen
e072632a54
no complaints about memory if the database is empty
12 years ago
Michael Peter Christen
b846f585fa
fixed a bug with size_i field usage
12 years ago
Michael Peter Christen
9db032664e
activate two solr fields which will be used by administration interface
...
(later)
12 years ago
orbiter
fcd5c7eec3
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
6171143b4a
added facet stub in JsonResponseWriter
12 years ago
Michael Peter Christen
e6330f648a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
e84ffdb4f3
enhanced solr writers
12 years ago
Michael Peter Christen
9644c186a4
added search functionality to ViewFile.html servlet
12 years ago
Marc Nause
03f3a8b647
*) fix for http://www.yacy-forum.org/viewtopic.php?f=2&t=759
12 years ago
Michael Peter Christen
b69ed96f0b
- added collections to yacydoc
...
- changed yacydoc.htm to yacydoc.json
- added query logging in solr and gsa search result
12 years ago
Michael Peter Christen
5df553c152
- added a json writer for solr (yes there was one using xslt but this
...
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
12 years ago
Michael Peter Christen
4634f0e626
fix for images_withalt
12 years ago
Michael Peter Christen
e65cecc419
- updated lucene libraries to 3.6.1
...
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
12 years ago
Michael Peter Christen
1754fbb6d9
Merge remote-tracking branch 'reger/master'
12 years ago
Michael Peter Christen
4d29f59a27
removed warnings
12 years ago
Michael Peter Christen
8c099d2106
Merge remote-tracking branch 'origin/master'
...
Conflicts:
htroot/api/ymarks/import_ymark.java
source/de/anomic/data/ymark/YMarkEntry.java
source/de/anomic/data/ymark/YMarkTables.java
12 years ago
apfelmaennchen
59bd478ed1
Added more sophisticated RDF output for YMarks, including the folder
...
structure (b:Topic) and support for multiple tags (dc:subject) and
folders (b:hasTopic) via rdf:Bag container.
12 years ago
apfelmaennchen
d31a632951
- added dmoz RDF dump importer
...
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
12 years ago
reger
40d8086bf7
keep input order of translation entries within one file section.
...
Allowing on translation conflicts (translaton of words contained in other sentence) to put shorter key at the end of the translation list.
12 years ago
Michael Peter Christen
10b911eed4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
be67c70a47
added Solr fields:
...
inboundlinks_text_chars_val
inboundlinks_text_words_val
inboundlinks_alttag_txt
outboundlinks_text_chars_val
outboundlinks_text_words_val
outboundlinks_alttag_txt
12 years ago
orbiter
d73fff0e0e
added solr field images_withalt_i
12 years ago
orbiter
66ac4076c2
added disjunction '|' option to site parameter in GSA API
12 years ago
sixcooler
a975bcffcb
clear fulltext-cache and stop crawling if running out of memory
12 years ago
sixcooler
e78fe3f477
also do a clearcache on the solr-connector-caches
12 years ago
sixcooler
9ee2e09983
statistics for solr-cache
12 years ago
Michael Peter Christen
d8425e6809
added collections to crawl monitor
12 years ago
Michael Peter Christen
ee23fc7a32
added h1..h6 counter fields
12 years ago
Michael Peter Christen
4b36a2c3b4
small style changes
12 years ago
Michael Peter Christen
8ca842b137
added new button design to more buttons
12 years ago