Michael Peter Christen
8c099d2106
Merge remote-tracking branch 'origin/master'
...
Conflicts:
htroot/api/ymarks/import_ymark.java
source/de/anomic/data/ymark/YMarkEntry.java
source/de/anomic/data/ymark/YMarkTables.java
13 years ago
apfelmaennchen
59bd478ed1
Added more sophisticated RDF output for YMarks, including the folder
...
structure (b:Topic) and support for multiple tags (dc:subject) and
folders (b:hasTopic) via rdf:Bag container.
13 years ago
apfelmaennchen
d31a632951
- added dmoz RDF dump importer
...
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
13 years ago
reger
40d8086bf7
keep input order of translation entries within one file section.
...
Allowing on translation conflicts (translaton of words contained in other sentence) to put shorter key at the end of the translation list.
13 years ago
Michael Peter Christen
10b911eed4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
be67c70a47
added Solr fields:
...
inboundlinks_text_chars_val
inboundlinks_text_words_val
inboundlinks_alttag_txt
outboundlinks_text_chars_val
outboundlinks_text_words_val
outboundlinks_alttag_txt
13 years ago
orbiter
d73fff0e0e
added solr field images_withalt_i
13 years ago
sixcooler
a975bcffcb
clear fulltext-cache and stop crawling if running out of memory
13 years ago
sixcooler
e78fe3f477
also do a clearcache on the solr-connector-caches
13 years ago
sixcooler
9ee2e09983
statistics for solr-cache
13 years ago
Michael Peter Christen
d8425e6809
added collections to crawl monitor
13 years ago
Michael Peter Christen
ee23fc7a32
added h1..h6 counter fields
13 years ago
Michael Peter Christen
b2b516cc3e
added a collection attribute to crawls and searches:
...
- a solr field collection_sxt can be used to store a set of crawl tags
- when this field is activated, a crawl tag can be assigned when crawls
are started
- the content of the collection field can be comma-separated, all of
them are assigned to the documents when they are indexed as result of
such a crawl start
- a search result can be drilled down to a specific collection; this is
currently only available in the solr interface and also in the gsa
interface using the 'site' option
- this adds a mandatory field for gsa queries (the google api demands
that field all the time)
13 years ago
Michael Peter Christen
4815713ec7
added synchronization to solr server requests since lucene is not
...
thread-safe. We experienced problems as described in
http://stackoverflow.com/questions/5327978/lockobtainfailedexception-updating-lucene-search-index-using-solr
13 years ago
Michael Peter Christen
f75b3f8a47
added more patches to work without RWI data structure
13 years ago
Michael Peter Christen
a427a68bac
removed many warnings
13 years ago
Michael Peter Christen
c72c435517
- moved the gsa search interface from /gsa/searchresult? to /gsa/search?
...
- fixed the NB field data
13 years ago
Michael Peter Christen
31d4d38804
- extended the solr interface by a references-by-word-count method
...
- reduced danger that a non-existing RWI database causes NPEs
- added Solr queries to did-you-mean: this makes it possible that our
did-you-mean algorithm works together with only Solr and without RWIs
13 years ago
Michael Peter Christen
528d6763fa
- added new solr fields:
...
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
13 years ago
Michael Peter Christen
3142e675e8
fixed problems with GSA api:
...
- better FS attribute
- highlightning of searched words in title
13 years ago
Michael Peter Christen
3b19fe7b52
- fixed num parameter in GSA api
...
- changed FS attribute in GSA api
13 years ago
Michael Peter Christen
2ddc33646a
added new field for solr:
...
url_paths_sxt
url_parameter_i
url_parameter_key_sxt
url_parameter_value_sxt
url_chars_i
13 years ago
Michael Peter Christen
75d5e3475d
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
cominch
a2841261bd
content control: apply filter if enabled to crawls
13 years ago
cominch
dc468dad01
add content control features for custom filter lists
13 years ago
Michael Peter Christen
316b5fe116
- added a solr type definition verifier
...
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes
13 years ago
orbiter
a3d5959981
Merge commit '65d49df865f60511d22d86fb15c33a082176e7ab'
13 years ago
Michael Peter Christen
4521d63c92
added boosts to solr search queries
13 years ago
Michael Peter Christen
e8acd542b5
- added faceted drill-down for host and geolocation to solr queries
...
- added a new geolocation field to index schema, the old values are
migrated if possible
13 years ago
Michael Peter Christen
f00168ecc5
added gsa result attribute 'has'
13 years ago
reger
65d49df865
security fix: clear automtic password only if adminAccountForLocalhost=false to prevent remote access to protected pages after restart.
...
if adminAccountForLocalhost=true leave automatic password unchanged so access from local host is granted but remote access is preventet from the 1st second.
13 years ago
orbiter
2094df2e4e
- correct length computation for BStringObject (bugfix suggested by
...
apfelmaennchen)
- using ASCII for string conversion for Strings generated from Integer
13 years ago
orbiter
6d03433cda
- added hack to prevent that stream servlet paths are not parsed wrongly
...
if the path contains a dot.
- added also warnings if documents are requests which do not exist.
13 years ago
orbiter
67f2866cd0
small fixes
13 years ago
orbiter
ce156a01ba
Merge commit 'c2341a175fdd755a34965ff63c7ea437b380352d'
13 years ago
David Rubio
c2341a175f
Fixed a bug that prevented Yacy from indexing files with non ASCII filenames in FTP servers.
...
Previously Yacy could read file listings in UTF-8, but couldn't send commands to the FTP server in UTF-8 (the second byte of every multi-byte character was ignored), which caused a lot of errors on the server side.
Now it handles UTF-8 correctly.
13 years ago
orbiter
3ebc4264c5
fixed concurrent query
13 years ago
orbiter
29171e2f6c
fixed generation of ontologies from index enumerations
13 years ago
orbiter
7cd302de3e
omit xml parsing when using the embedded solr server
13 years ago
orbiter
787e1c6836
added the
...
QueryResponse query(SolrParams params)
method to the SolrServerConnector which is necessary to use facets in
solr search.
13 years ago
orbiter
01a63ef595
redesign of YaCySchema and SolrDoc handling
13 years ago
orbiter
479bfca571
refctoring
13 years ago
Michael Peter Christen
48a82bc705
log queries anonymous from gsa+solr requests
13 years ago
Michael Peter Christen
ab6ec4ec52
added snippet computation to solr/rss and gsa result writer
13 years ago
Michael Peter Christen
4716546ef5
- reduced memory usage in index transmission using a transformation of
...
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
13 years ago
Michael Peter Christen
06b0081fdc
fix for NPE during host navigation computation
13 years ago
Michael Peter Christen
feb99bc291
fixed GSA format
13 years ago
Michael Peter Christen
653645c1cf
corrected solr query syntax
13 years ago
Michael Peter Christen
08ae142a3d
- enhanced caching after search queries to solr
...
- reduced caching after short memory
13 years ago
orbiter
716ea0cfe2
sorted the solr schema into mandatory and optional fields; reduced
...
number of used field to reduce solr index size
13 years ago
orbiter
9b8c8c0f47
fix from gaston in
...
http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909
13 years ago
orbiter
acb9f04e80
removed unused classes
13 years ago
Michael Peter Christen
0ad52ac4c3
gsa bugfix for date parser
13 years ago
Michael Peter Christen
3ce4c2f937
fixes for gsa result format
13 years ago
Michael Peter Christen
67d235fae9
added gzip encoding to solr2sor http interface, client side (server
...
already works)
13 years ago
Michael Peter Christen
a049761e0c
fixed double-check
13 years ago
Michael Peter Christen
f42a57cd7d
gsa format update
13 years ago
Michael Peter Christen
b3aad6cc35
bugfix for remote search when search is done to solr
13 years ago
Michael Peter Christen
ff3eaa21b0
added remote search to solr on YaCy peers!
...
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr
13 years ago
Michael Peter Christen
a06123aec6
more abstraction and less parameter overhead for remote search
13 years ago
Michael Peter Christen
f00733186b
code simplifications
13 years ago
Michael Peter Christen
755f5e76cf
removed strange assert statements and simplified code in metadata
...
transformation
13 years ago
Michael Peter Christen
db0d438709
fix for http://bugs.yacy.net/view.php?id=206
13 years ago
orbiter
404b0aab09
refactoring in remote search and stub for remote node peer selection
13 years ago
orbiter
d7ea45f698
- get nice text_t values from metadata conversions that are stored into
...
solr as fulltext search index.
- added slow migration from old metadata to solr index entries: each
entry from the old metadata is removed from that data structure and
written into solr.
13 years ago
orbiter
99ef57f103
reduced sleep times
13 years ago
orbiter
780f8974e7
added ramaining iteration methods for solr in fulltext class
13 years ago
orbiter
acd2dc3575
hack to removed StringBuilder overhead in query construction
13 years ago
orbiter
ee01c12e56
fixes for putDocument and putMetadata
13 years ago
orbiter
cc47a0876e
reverted bf55f69176
...
to have a fall-back option in case that memory problems as reported in
http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901
for full-solr installation are too strong and we have to work with an
'small memory footprint' peer system.
13 years ago
Michael Peter Christen
0904afe8fb
added concurrent iterator methods to the solr connectors
13 years ago
Michael Peter Christen
d54b80327a
refactoring
13 years ago
Michael Peter Christen
f9fc5cfaba
better check for bad urls in url transmission
13 years ago
Michael Peter Christen
d39463a85c
added deleteByQuery to solr connectors
13 years ago
Michael Peter Christen
0cab06c47c
refactoring
13 years ago
Michael Peter Christen
bf55f69176
removed write methods to old metadata file type; all metadata now goes
...
to solr
13 years ago
Michael Peter Christen
40c0856489
refactoring
13 years ago
Michael Peter Christen
06a78eecb7
code simplification
13 years ago
Michael Peter Christen
54bea21c02
bugfix for solr connector, possibly a cause for
...
http://forum.yacy-websuche.de/viewtopic.php?p=26893#p26893
13 years ago
Michael Peter Christen
9bece5ac5f
enhanced snippet fetch - removed a bug that caused documents to be
...
parsed even if a solr text was available
13 years ago
Michael Peter Christen
18f989dfb1
- refactoring (load -> getMetadata)
...
- added getDocument to retrieve Solr documents which shall replace
getMetadata
13 years ago
Michael Peter Christen
395b78a0d8
using the solr search index to concurrently search within solr and the
...
rwis during local search requests.
13 years ago
Michael Peter Christen
6197caf698
added clear-text search words in query params
13 years ago
Michael Peter Christen
efafa79db5
- added a content-encoding: gzip to streamed http server responses
...
- finish and close streamed http responses immediately
- this applies only to the solr interface which should be much faster
now!
13 years ago
Michael Peter Christen
23226676c6
FOR THE BRAVE.. this is a forced migration to solr which is now ready
...
for production as a replacement of the metadata-db.
This intermediate release 1.041 will switch on the previously optional
solr index and the old metadata-db will still work as it did before.
Solr+metadata are accessed in mixed mode, no migration is done yet.
If this causes not a catastrophe until the end of the weekend, we will
do a YaCy 1.1 main release containing this as default.
13 years ago
Michael Peter Christen
a1b2c9a67d
doctype2mime fix, influences metadata conversion between old metadata
...
and solr
13 years ago
Michael Peter Christen
a16206e38b
more attempts to clean the index (cleaning is faster then)
13 years ago
Michael Peter Christen
703f427303
fixed some peer-ping connection details
...
- larger time-out
- removed too old seedlist
- fixed a bug in connection test
13 years ago
Michael Peter Christen
597bb76e4f
get the peer location more quickly
13 years ago
Michael Peter Christen
1641835fef
replaced yacy xml encoding by solr xml encoding
13 years ago
Michael Peter Christen
89fe13e73d
enhanced GSA and RSS output format: corrected date, added some missing
...
fields, added xml encoding for utf8
13 years ago
Michael Peter Christen
ea49a8aa8c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
d988ba50cf
added a very rudimentary, incomplete, non-verified GSA response writer
...
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10
13 years ago
Michael Peter Christen
aab0b680c3
- added xslt support for solr result formats.
...
try i.e.
http://localhost:8090/solr/select?q=*:*&start=0&rows=10&wt=xslt&tr=json.xsl
- added servlet-side mime-type configuration for streamed servlets. this
is used for the result formatters in solr result formats
13 years ago
cominch
e2119f4e76
augmented browsing: replace htmlparser by jsoup, which is more stable
...
and reliable
13 years ago
Michael Peter Christen
9448d9a8a2
ups
13 years ago
Michael Peter Christen
e5ef840f40
- renamed DoubleSolrConnector to MirrorSolrConnector and added a
...
hit/miss/document cache to the MirrorSolrConnector.
- more abstraction to SolrDocument in Connector interface
- bugfixes in Solr field reader
13 years ago
Michael Peter Christen
94a334f128
another fix to the Solr metadata reading process and to the shutdown
...
process
13 years ago
Michael Peter Christen
b51df6c7e8
- added coordinate storage in solr schema
...
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html
13 years ago
Michael Peter Christen
da851c6071
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago